Gabi Stanovsky
gabistanovsky.bsky.social
Gabi Stanovsky
@gabistanovsky.bsky.social
Assistant professor at the Hebrew University.
Pinned
There's a lot of talk about regulating AI, but do regulators know the technology well enough?
In our new paper, we survey major reg efforts & find they rely on benchmarking, which we know to be problematic. How did this happen & what can we do about it?
arxiv.org/pdf/2501.15693
Reposted by Gabi Stanovsky
🚨New paper alert🚨

🧠
Instruction-tuned LLMs show amplified cognitive biases — but are these new behaviors, or pretraining ghosts resurfacing?

Excited to share our new paper, accepted to CoLM 2025🎉!
See thread below 👇
#BiasInAI #LLMs #MachineLearning #NLProc
July 15, 2025 at 1:38 PM
Reposted by Gabi Stanovsky
Can RAG performance get * worse * with more relevant documents?📄
We put the number of retrieved documents in RAG to the test!
💥Preprint💥: arxiv.org/abs/2503.04388
1/3
March 11, 2025 at 2:32 PM
Reposted by Gabi Stanovsky
🚨New arXiv preprint!🚨
LLMs can hallucinate - but did you know they can do so with high certainty even when they know the correct answer? 🤯
We find those hallucinations in our latest work with @itay-itzhak.bsky.social, @fbarez.bsky.social, @gabistanovsky.bsky.social and Yonatan Belinkov
February 19, 2025 at 3:50 PM
Reposted by Gabi Stanovsky
GEM is so back! Our workshop for Generation, Evaluation, and Metrics is coming to an ACL near you.

Evaluation in the world of GenAI is more important than ever, so please consider submitting your amazing work.

CfP can be found at gem-benchmark.com/workshop
February 12, 2025 at 2:25 PM
A vote to stop defining what's LLMs at the start of every paper
February 6, 2025 at 8:30 AM
There's a lot of talk about regulating AI, but do regulators know the technology well enough?
In our new paper, we survey major reg efforts & find they rely on benchmarking, which we know to be problematic. How did this happen & what can we do about it?
arxiv.org/pdf/2501.15693
February 3, 2025 at 8:04 AM