HUJI NLP
banner
nlphuji.bsky.social
HUJI NLP
@nlphuji.bsky.social
The NLP group at the Hebrew University of Jerusalem.
@royschwartznlp.bsky.social, @gabistanovsky.bsky.social, @tomhope.bsky.social and Prof. Omri Abend.
That’s a wrap on our first Huji NLP Hackathon!
Congrats to the winning team!
@noy-sternlicht.bsky.social @nirmazor.bsky.social

They explored gender bias in AI-generated movie scripts using the Bechdel Test — and yep, you can guess the results...
April 24, 2025 at 12:34 PM
Reposted by HUJI NLP
Care about LLM evaluation? 🤖 🤔

We bring you ️️🕊️ DOVE a massive (250M!) collection of LLMs outputs 
On different prompts, domains, tokens, models...

Join our community effort to expand it with YOUR model predictions & become a co-author!
March 17, 2025 at 2:37 PM
Reposted by HUJI NLP
Can RAG performance get * worse * with more relevant documents?📄
We put the number of retrieved documents in RAG to the test!
💥Preprint💥: arxiv.org/abs/2503.04388
1/3
March 11, 2025 at 2:32 PM
Reposted by HUJI NLP
There's a lot of talk about regulating AI, but do regulators know the technology well enough?
In our new paper, we survey major reg efforts & find they rely on benchmarking, which we know to be problematic. How did this happen & what can we do about it?
arxiv.org/pdf/2501.15693
February 3, 2025 at 8:04 AM
Reposted by HUJI NLP
- “I heard there’s a new paper about Theory of Mind in LLMs!”
- “I know! There’s like hundreds of them!”

Could someone be driving in the wrong direction?

Check out our new opinion paper. w/ @nitalon.bsky.social , @joebarnby.bsky.social and Omri Abend.
🚨 New paper (with @eitanwagner.bsky.social
@joebarnby.bsky.social and Omri Abend) on how we (should) evaluate Theory of Mind in Large Language Models. While recent work claims LLMs have ToM capabilities, we're missing crucial aspects from cognitive science. Here's why this matters 🧵
December 19, 2024 at 1:05 PM
Reposted by HUJI NLP
New preprint! ✨
Interested in LLM-as-a-Judge?
Want to get the best judge for ranking your system?
our new work is just for you:
"JuStRank: Benchmarking LLM Judges for System Ranking"
🕺💃
arxiv.org/abs/2412.09569
JuStRank: Benchmarking LLM Judges for System Ranking
Given the rapid progress of generative AI, there is a pressing need to systematically compare and choose between the numerous models and configurations available. The scale and versatility of such eva...
arxiv.org
December 13, 2024 at 10:16 AM
Reposted by HUJI NLP
1/n First time in the sky ✈️

I had a great time presenting my work at @emnlpmeeting.bsky.social ’s Workshop on Narrative Understanding and reconnecting with friends and colleagues in Miami! 🌴

How do religious trajectories evolve in Holocaust testimony narratives?
November 21, 2024 at 3:13 PM