Examples Are Not Equally Informative: How Should That Change NLP Leaderboards?).
From my reading, the big difference seems to be that they're also using the agent's skill, which is super cool!
Examples Are Not Equally Informative: How Should That Change NLP Leaderboards?).
From my reading, the big difference seems to be that they're also using the agent's skill, which is super cool!
arxiv.org/abs/2205.12507
arxiv.org/abs/2205.12507
t.co/QQlgwzo6jf
t.co/2G6kwAAPMy
t.co/QQlgwzo6jf
t.co/2G6kwAAPMy
@wwongkamjan.bsky.social
at the Findings poster (18:00, Hall X4/X5): Should I Trust You? Detecting Deception in Negotiations using Counterfactual RL
x.com/joywwong/sta...
@wwongkamjan.bsky.social
at the Findings poster (18:00, Hall X4/X5): Should I Trust You? Detecting Deception in Negotiations using Counterfactual RL
x.com/joywwong/sta...
@yysung.bsky.social is presenting: GRACE: A Granular Benchmark for Evaluating Model Calibration against Human Calibration
x.com/YooYeonSung1...
(Short version: quiz bowl, a dumb trivia game, shows humans' calibration > LLMs'.)
@yysung.bsky.social is presenting: GRACE: A Granular Benchmark for Evaluating Model Calibration against Human Calibration
x.com/YooYeonSung1...
(Short version: quiz bowl, a dumb trivia game, shows humans' calibration > LLMs'.)
@nbalepur.bsky.social is presenting: Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas
x.com/NishantBalep...
@nbalepur.bsky.social is presenting: Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas
x.com/NishantBalep...
docs.google.com/forms/d/e/1F...
[Signup deadline: June 18 Anywhere on Earth]
docs.google.com/forms/d/e/1F...
[Signup deadline: June 18 Anywhere on Earth]
sites.google.com/view/qanta/2...
sites.google.com/view/qanta/2...
hsquizbowl.org/forums/viewt...
hsquizbowl.org/forums/viewt...
sites.google.com/view/qanta/2...
sites.google.com/view/qanta/2...
docs.google.com/forms/d/e/1F...
docs.google.com/forms/d/e/1F...