aryanshri123.bsky.social
@aryanshri123.bsky.social
Reposted
HR Simulator™: a game where you gaslight, deflect, and “let’s circle back” your way to victory.
Every email a boss fight, every “per my last message” a critical hit… or maybe you just overplayed your hand 🫠
Can you earn Enlightened Bureaucrat status?

(link below!)
September 26, 2025 at 6:41 PM
Reposted
Prompting is our most successful tool for exploring LLMs, but the term evokes eye-rolls and grimaces from scientists. Why? Because prompting as scientific inquiry has become conflated with prompt engineering.

This is holding us back. 🧵and new paper with @ari-holtzman.bsky.social .
July 9, 2025 at 8:07 PM
🤫Jailbreak prompts make aligned LMs produce harmful responses.🤔But is that info linearly decodable?

↗️We show many refused concepts are linearly represented, sometimes persist through instruction-tuning, and may also shape downstream behavior❗

arxiv.org/abs/2507.00239
🧵1/
July 3, 2025 at 8:07 PM
We expose the "absence blindness" in the best LLMs, even when considering relatively short documents. Using LLMs as a judge or as graders may not be so reliable. Looking forward to see what comes out of this!
LLMs excel at finding surprising “needles” in very long documents, but can they detect when information is conspicuously missing?

🫥AbsenceBench🫥 shows that even SoTA LLMs struggle on this task, suggesting that LLMs have trouble perceiving “negative spaces”.
Paper: arxiv.org/abs/2506.11440

🧵[1/n]
June 20, 2025 at 10:21 PM
Reposted
LLMs excel at finding surprising “needles” in very long documents, but can they detect when information is conspicuously missing?

🫥AbsenceBench🫥 shows that even SoTA LLMs struggle on this task, suggesting that LLMs have trouble perceiving “negative spaces”.
Paper: arxiv.org/abs/2506.11440

🧵[1/n]
June 20, 2025 at 10:03 PM