Adi Simhi
adisimhi.bsky.social
Adi Simhi
@adisimhi.bsky.social
NLProc, and machine learning. Ph.D. student Technion
ManagerBench was accepted to #ICLR2026🎉
Check it out⬇️
ManagerBench was accepted to ICLR! @iclr-conf.bsky.social #ICLR2026

LLMs are still either unsafe, or completely harm avoidant - even when the harm affects furniture 🛋️

Check out our benchmark, online or in Rio 🇧🇷
🤔What happens when LLM agents choose between achieving their goals and avoiding harm to humans in realistic management scenarios? Are LLMs pragmatic or prefer to avoid human harm?

🚀 New paper out: ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs🚀🧵
February 5, 2026 at 6:16 AM
Check out our new paper on evaluating LLM agents on their preference for achieving their goal and avoiding human harm, called ManagerBench👔
🤔What happens when LLM agents choose between achieving their goals and avoiding harm to humans in realistic management scenarios? Are LLMs pragmatic or prefer to avoid human harm?

🚀 New paper out: ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs🚀🧵
October 8, 2025 at 4:08 PM
🚨New arXiv preprint!🚨
LLMs can hallucinate - but did you know they can do so with high certainty even when they know the correct answer? 🤯
We find those hallucinations in our latest work with @itay-itzhak.bsky.social, @fbarez.bsky.social, @gabistanovsky.bsky.social and Yonatan Belinkov
February 19, 2025 at 3:50 PM