boydgraber.bsky.social
@boydgraber.bsky.social
Today's the deadline to apply for an AI-specific teaching track position at UMD:

umd.wd1.myworkdayjobs.com/UMCP/job/Uni...

Please join us!
August 22, 2025 at 3:47 PM
My students and I are presenting three papers on Monday at #ACL2025 and this thread will recap them (including their videos).
July 28, 2025 at 8:35 AM
The precursor to this paper "The Incoherence of Coherence" had our most-watched paper video ever, so I thought we had to surpass it somehow ... so we decided to do a song parody (of Roxanne, obviously):

youtu.be/87OBxEM8a9E
July 18, 2025 at 6:37 PM
We had our first human–computer cooperative AI tournament at the UMD. Key takeaways: 1) computers are getting better at trivia 2) they still suck at calibration 3) our teaming mechanic kept the games competitive and mostly fun (at least that’s what the players said).
June 17, 2025 at 3:35 PM
Today is the deadline to sign up for our Human-Computer trivia competition held on June 14, 2024 in College Park, MD. $150 prize for the team who can answer the most questions with the help of an AI.
June 10, 2025 at 4:23 PM
Do you like trivia? Can you spot when AI is feeding you BS? Or can you make AIs turn themselves inside out? Then on June 14 at College Park (or June 21 online), we have a competition for you.
June 5, 2025 at 4:17 PM
Reposted
New Pleias paper: "What the HellaSwag?"
HellaSwag is currently on of the most widely LLM benchmarks in the world. We introduce a new critical method to assess the validity of standard LLM evals and show it does not accurately measure common sense reasoning. arxiv.org/abs/2504.07825
April 14, 2025 at 3:44 PM
Reposted
This was a really fun paper to put together with Rachel and @boydgraber.bsky.social allowing me to vent many of my frustrations working with MCQA over the past year 😪🫡

Please check out the paper, we would love to hear your feedback! 📄👇
February 24, 2025 at 9:04 PM
Reposted
🚨 You are only evaluating a slice of your test-time scaling model's performance! 🚨

📈 We consider how models’ confidence in their answers changes as test-time compute increases. Reasoning longer helps models answer more confidently!

📝: arxiv.org/abs/2502.13962
February 20, 2025 at 3:14 PM
Is anyone in my network connected to Align to Innovate? Or know somebody who is?

alignbio.org
Align to Innovate
Reproducible. Scalable. Sharable. Improving research science with programmable experiments.
alignbio.org
February 5, 2025 at 2:04 AM
Reposted
Hi. I'm Andrew. I own New England's oldest map store because last year I moved across the country after an old guy retired and gave it to me Willy Wonka-style. Visit my store in Rhode Island. www.mapcenter.com
December 17, 2024 at 11:18 PM
In about half an hour, I'll be doing my annual Q&A session on grad admissions:

youtube.com/live/jVjTbPH...
YouTube
Share your videos with friends, family, and the world
youtube.com
December 6, 2024 at 1:26 PM
Reposted
At its heart, Star Trek is a utopian fantasy about a society so advanced that they are capable of holding productive meetings that last no longer than three minutes
December 3, 2024 at 4:58 PM
I just made my way to Bluesky, so I thought it might be a good opportunity to shamelessly remind people to vote in the ACL board elections (where I'm running for an at large post on a platform of improving virtual conferences).

Check your e-mail for "Reminder: ACL 2024 Elections - Please Vote".
November 26, 2024 at 8:14 PM