yatskar.bsky.social
@yatskar.bsky.social
Reposted
Excited to share ✨ Contextualized Evaluations ✨!

Benchmarks like Chatbot Arena contain underspecified queries, which can lead to arbitrary eval judgments. What happens if we provide evaluators with context (e.g who's the user, what's their intent) when judging LM outputs? 🧵↓
November 13, 2024 at 2:16 PM
Reposted
Deadline Extended!
Submit to the LM4Sci Workshop @ COLM 2025 in Montreal 🇨🇦

🧠 Large Language Modeling for Scientific Discovery (LM4Sci)
📅 New Deadline: June 30
📢 Notification: July 24
📍 Workshop: Oct 10, 2025

📝 Non-archival short (2–4p) & full (up to 8p) papers welcome!
June 22, 2025 at 9:37 PM
Reposted
🚨 Call for Papers: LM4Sci @COLM_conf 2025 🚨

Excited to announce the Large Language Modeling for Scientific Discovery (LM4Sci) workshop at COLM 2025 in Montreal, Canada!

Submission Deadline: June 23
Notification: July 24
Workshop: October 10, 2025
June 14, 2025 at 12:03 AM
Reposted
Ever wondered what makes language models generate overly verbose, vague, or sycophantic responses?

Our new paper investigates these and other idiosyncratic biases in preference models, and presents a simple post-training recipe to mitigate them! Thread below 🧵↓
June 6, 2025 at 4:32 PM