siddheshp.bsky.social
@siddheshp.bsky.social
Grad Student; Into Multilingual NLP
Reposted
Check out the camera-ready version of our ICML position paper ("Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge") to learn more!!! arxiv.org/abs/2502.00561

(6/6)
Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge
The measurement tasks involved in evaluating generative AI (GenAI) systems lack sufficient scientific rigor, leading to what has been described as "a tangle of sloppy tests [and] apples-to-oranges com...
arxiv.org
June 15, 2025 at 12:20 AM
Reposted
i mean, people have different goals, and if you cared about some niche aspect of query focused multi doc sum before, it is legit to continue. or you can switch focus and start thinking of HCI. the second became much more possible now, the first maybe hasnt.
December 17, 2024 at 5:16 PM
Reposted
🌶️(?) take: Agents are somehow hot right because people realized that LLM output can be interpreted as a DSL which directs side effects in the world (e.g. tool calls) rather than just returning text in a chat/autocomplete sense. What are the open challenges? A 🧵... [1/11]
November 19, 2024 at 9:32 AM
Reposted
#EMNLP has a nice set of tokenization/subword modeling papers this year.

It's a good mix of tokenization algorithms, tokenization evaluation, tokenization-free methods, and subword embedding probing. Lmk if I missed some!

Here is a list with links + presentation time (in chronological order).
November 11, 2024 at 10:38 PM
We are excited to share our comprehensive survey on cultural awareness in #LLMs! 🗺️ [Was posted on X a few days before]
We reviewed 300+ papers across diverse modalities (language, vision-language, etc.)
arxiv.org/abs/2411.00860
Survey of Cultural Awareness in Language Models: Text and Beyond
Large-scale deployment of large language models (LLMs) in various applications, such as chatbots and virtual assistants, requires LLMs to be culturally sensitive to the user to ensure inclusivity. Cul...
arxiv.org
November 11, 2024 at 9:57 AM