@royschwartznlp.bsky.social, @gabistanovsky.bsky.social, @tomhope.bsky.social and Prof. Omri Abend.
Congrats to the winning team!
@noy-sternlicht.bsky.social @nirmazor.bsky.social
They explored gender bias in AI-generated movie scripts using the Bechdel Test — and yep, you can guess the results...
Congrats to the winning team!
@noy-sternlicht.bsky.social @nirmazor.bsky.social
They explored gender bias in AI-generated movie scripts using the Bechdel Test — and yep, you can guess the results...
We bring you ️️🕊️ DOVE a massive (250M!) collection of LLMs outputs
On different prompts, domains, tokens, models...
Join our community effort to expand it with YOUR model predictions & become a co-author!
We bring you ️️🕊️ DOVE a massive (250M!) collection of LLMs outputs
On different prompts, domains, tokens, models...
Join our community effort to expand it with YOUR model predictions & become a co-author!
We put the number of retrieved documents in RAG to the test!
💥Preprint💥: arxiv.org/abs/2503.04388
1/3
We put the number of retrieved documents in RAG to the test!
💥Preprint💥: arxiv.org/abs/2503.04388
1/3
In our new paper, we survey major reg efforts & find they rely on benchmarking, which we know to be problematic. How did this happen & what can we do about it?
arxiv.org/pdf/2501.15693
In our new paper, we survey major reg efforts & find they rely on benchmarking, which we know to be problematic. How did this happen & what can we do about it?
arxiv.org/pdf/2501.15693
- “I know! There’s like hundreds of them!”
…
Could someone be driving in the wrong direction?
Check out our new opinion paper. w/ @nitalon.bsky.social , @joebarnby.bsky.social and Omri Abend.
@joebarnby.bsky.social and Omri Abend) on how we (should) evaluate Theory of Mind in Large Language Models. While recent work claims LLMs have ToM capabilities, we're missing crucial aspects from cognitive science. Here's why this matters 🧵
- “I know! There’s like hundreds of them!”
…
Could someone be driving in the wrong direction?
Check out our new opinion paper. w/ @nitalon.bsky.social , @joebarnby.bsky.social and Omri Abend.
Interested in LLM-as-a-Judge?
Want to get the best judge for ranking your system?
our new work is just for you:
"JuStRank: Benchmarking LLM Judges for System Ranking"
🕺💃
arxiv.org/abs/2412.09569
Interested in LLM-as-a-Judge?
Want to get the best judge for ranking your system?
our new work is just for you:
"JuStRank: Benchmarking LLM Judges for System Ranking"
🕺💃
arxiv.org/abs/2412.09569
I had a great time presenting my work at @emnlpmeeting.bsky.social ’s Workshop on Narrative Understanding and reconnecting with friends and colleagues in Miami! 🌴
How do religious trajectories evolve in Holocaust testimony narratives?
I had a great time presenting my work at @emnlpmeeting.bsky.social ’s Workshop on Narrative Understanding and reconnecting with friends and colleagues in Miami! 🌴
How do religious trajectories evolve in Holocaust testimony narratives?