Michael Saxon
banner
saxon.me
Michael Saxon
@saxon.me
Doctor of NLP/Vision+Language from UCSB

Evals, metrics, multilinguality, multiculturality, multimodality, and (dabbling in) reasoning

https://saxon.me/
Pinned
🆕 from us at #EMNLP: Are LMs better at answering questions about Germany in German than in French? Is national knowledge linguistically contingent?

Interestingly, only for some multilingual models is this true. Aya knows China best in Chinese, but LLaMA's best in English always.
Guys I'm really worried about the threat of superintelligent AI, and wouldn't you know it, the best way to stop it is gonna be for you to give me a whole lotta money for my startup
November 10, 2025 at 6:35 AM
More than choosing good project ideas, to me "research taste" means recognizing what the interesting part of a result is and how it connects to a bigger narrative. Almost any nontrivial result can be important within the right lens.

More than anything my PhD taught me this.
November 5, 2025 at 8:25 PM
🆕 from us at #EMNLP: Are LMs better at answering questions about Germany in German than in French? Is national knowledge linguistically contingent?

Interestingly, only for some multilingual models is this true. Aya knows China best in Chinese, but LLaMA's best in English always.
November 5, 2025 at 7:47 PM
Beautiful Blooj tears. Mariners will not have to feel the "should have been us" pain
Hate da Dodgers but also Bloojays need to pay for knocking out Seattle. I'll be smug whoever loses.
November 2, 2025 at 4:23 AM
Hate da Dodgers but also Bloojays need to pay for knocking out Seattle. I'll be smug whoever loses.
November 2, 2025 at 4:11 AM
November 1, 2025 at 6:01 AM
Very pro dislike button. I think the asymmetry that comes from being able to leave drive-by approval (likes) but only high-engagement disapproval (comments) raises the temperature of negative interactions, and incentivizes ragebaiting with less visible shame
October 31, 2025 at 11:54 PM
I didn't realize arXiv is a postprint server
blog.arxiv.org/2025/10/31/a...

FYI the blog post for the updated policy is out. Our llm future is dire:/
October 31, 2025 at 7:55 PM
It's #NSF #GRFP application season again so it's time to re-up my GRFP application advice post!

Also, check out the cool bsky comment integration I've added to the blog! Engagement with this post will go under the blogpost on my site as comments!

saxon.me/blog/2024/gr...
NSF GRFP Application Tips for NLP, AI, CS
Reflections and advice from my successful NSF GRFP proposal in NLP. Why I think my applications worked well, what I wish I did differently, and links to my actual statements and feedback from the GRFP...
saxon.me
October 30, 2025 at 8:03 PM
"It's country over club as Aaron Judge will lead team USA in the World Baseball Classic next March, assuming this game will be over by then" 💀
October 28, 2025 at 6:00 AM
Reposted by Michael Saxon
fukuyama was right. history ended. we are stuck inside this game forever
October 28, 2025 at 5:28 AM
WAITER? ANOTHER INNING OF NOTHING PLEASE
October 28, 2025 at 5:00 AM
It's live! Here's an example post: saxon.me/blog/2025/la...

Turning the replies to a bluesky post into the comment section for a blogpost is a small concrete way to support the ecosystem: future visitors who want to add comments incentivized to interact on the platform

Also, it's very easy to do:
October 27, 2025 at 6:50 PM
Stage four: The sign bears no relation to any reality whatsoever; it is its own pure simulacrum

youtu.be/6i2I3dkZ5-M
Bush Step! (JibJab)
YouTube video by pipo
youtu.be
October 27, 2025 at 4:04 PM
Prototyping bluesky comment integrations for the blog (gonna need to modify a lot more to make it fully work with my tempalte)

Also, I am getting more and more indiewebpilled. Would any other NLPMLAI researcher-bloggers be interested in making a webring?
October 27, 2025 at 7:27 AM
A big part of why talking about LMs vs humans is hard is interlocutors often don't agree if they're considering idealized, worst-case, or "average" humans or LMs

Personally, I think idealized human vs average LM is most germane set to use to think about capabilities
October 26, 2025 at 7:31 AM
Reposted by Michael Saxon
Federal Judges— or staff, but same-same— used "AI" to summarize & draft rulings, issued them w/o checking the work, leading to basic factual errors, & thus undermining the facticity & validity of the rulings entire.

Gee. Who Could Have Foreseen. *stares directly into the camera like in the office*
Two federal judges say use of AI led to errors in US court rulings
Two federal judges admitted in response to an inquiry by U.S. Senate Judiciary Committee Chairman Chuck Grassley that members of their staff used artificial intelligence to help prepare recent court orders that Grassley called "error-ridden."
www.reuters.com
October 26, 2025 at 4:47 AM
Even though it should feel pointless to care about this as our republic crumbles but somehow this shit does still make my blood boil

www.404media.co/a16z-backed-...
a16z-Backed Startup Sells Thousands of ‘Synthetic Influencers’ to Manipulate Social Media as a Service
Andreessen Horowitz is funding a company that clearly violates the inauthentic behavior policies of every major social media platform.
www.404media.co
October 25, 2025 at 6:21 AM
nightmare blunt rotation
The list of signatories on the latest "Please Skynet, don't kill us" letter is BONKERS.
October 22, 2025 at 8:31 PM
Reposted by Michael Saxon
🚨New paper: Reward Models (RMs) are used to align LLMs, but can they be steered toward user-specific value/style preferences?
With EVALUESTEER, we find even the best RMs we tested exhibit their own value/style biases, and are unable to align with a user >25% of the time. 🧵
October 14, 2025 at 3:59 PM
Reposted by Michael Saxon
Happy to share that I’m presenting 3 research projects at AIES 2025 🎉

1️⃣Gender bias over-representation in AI bias research 👫
2️⃣Stable Diffusion's skin tone bias 🧑🏻🧑🏽🧑🏿
3️⃣Limitations of human oversight in AI hiring 👤🤖

Let's chat if you’re at AIES or read below/reach out for details!
#AIES25 #AcademicSky
October 21, 2025 at 11:39 AM
we need to finally replace Susan Collins with a Democrat!

*The monkey's paw curls*
October 21, 2025 at 7:37 PM
Reposted by Michael Saxon
October 20, 2025 at 9:10 PM
Reposted by Michael Saxon
You know the results are legit because the paper uses the NeurIPS LaTeX tEmPlaTE.
October 17, 2025 at 3:39 PM