LightNews — Scholar-powered news

Reposted by Venkat

Simon Fisher

@profsimonfisher.bsky.social

“Humans across multiple languages spontaneously associate the nonwords kiki & bouba with spiky & round shapes, respectively...We tested the bouba-kiki effect in baby chickens. Similar to humans, they spontaneously chose a spiky shape when hearing a kiki sound & a round shape when hearing a bouba.”😲🧪

Matching sounds to shapes: Evidence of the bouba-kiki effect in naïve baby chicks

Humans across multiple languages spontaneously associate the nonwords “kiki” and “bouba” with spiky and round shapes, respectively, a phenomenon named the bouba-kiki effect. To explore the origin of t...

www.science.org

February 19, 2026 at 7:20 PM

Venkat

@venkatasg.net

I read somewhere that the open-source LLMs are 'benchmaxxing': they're trained to do well on benchmarks but don't translate to general improvements. From my simple benchmark that seems true: I was surprised the only models that do decently at FizzBuzz are all the frontier, closed LLMs.

Graph leaderboard from my benchmark that measures how well LLMs can play the turn based game Fizzbuzz with standard and modified rules. Frontier, closed LLMs outperform open source LLMs by a wide margin.

February 12, 2026 at 10:04 PM

Venkat

@venkatasg.net

I made a Huggingface leaderboard to track model progress on my FizzBuzz benchmark: huggingface.co/spaces/venka...
One thing I've noticed is that something changed with the new generation of models, especially the biggest ones. They all ace it, even with different rules.

February 7, 2026 at 1:38 AM

Venkat

@venkatasg.net

@hankgreen.bsky.social I'm assuming YouTubers don't get a cut when gemini pulls your recent videos to answer questions? I was wondering if Google pulled YouTube videos directly (they can). ChatGPT can't and uses thid party sources like youtubesummary.com 😭. The video autoplays but only if i scroll.

Google gemini answer to my question 'Why did hank green say coal is dumb in his last video?'. It pulled the YouTube video and presumably used it to answer the question.

February 3, 2026 at 5:25 PM

Reposted by Venkat

Asher Zheng

@asher-zheng.bsky.social

Very excited to share that the paper w/
@jessyjli.bsky.social @DavidBeaver
"Strategic Dialogue Assessment: The Crooked Path to Innocence" (used to have the name COBRA) was accepted by Dialogue and Discourse Vol 17 No.1. Check it out! 👉https://journals.uic.edu/ojs/index.php/dad/article/view/14503

January 30, 2026 at 3:02 AM

Venkat

@venkatasg.net

🗣️New Preprint! I'm really excited to talk about this new short paper (w/Laura Biester) analyzing sentences from the Bulwer-Lytton Fiction Contest (BLFC). BLFC challenged writers to write the 'worst opening sentence to the most atrocious novel ever written'. This is a corpus of "bad" sentences! (1/5)

An example entry from the BLFC and our dataset, with key literary devices highlighted.

January 27, 2026 at 5:36 PM

Reposted by Venkat

Kanishka Misra

@kanishka.bsky.social

“All bears have a property”, “Some bears have a property”, “Bears have a property” are different in terms of how the property is generalized to a specific bear – a great example of how language constrains thought!

This holds for kids, adults, and according to our new work, (V)LMs! 🧵

Title page of our paper: "Bears, all bears, and some bears. Language Constraints on Language Models' Inductive Inferences"

January 27, 2026 at 4:16 PM

Reposted by Venkat

David Bau

@davidbau.bsky.social

What should academics be doing right now?

I have been writing up some thoughts on what the research says about effective action, and what universities specifically can do.

davidbau.github.io/poetsandnurs...

It's on GitHub. Suggestions and pull requests welcome.
github.com/davidbau/poe...

Federal agents with weapons drawn, moments before murdering American citizens on the streets of Minneapolis at the dawn of 2026.

January 26, 2026 at 3:27 AM

Reposted by Venkat

Juan Diego Rodriguez

@juand-r.bsky.social

Our first South by Semantics lecture of the semester at UT Austin is happening next week on January 30th!

I'm excited to hear Dr. Amir Zeldes (Associate Professor at Georgetown University) talk about saliency in discourse and the memorability of salient information for both humans and LLMs.

January 22, 2026 at 1:00 AM

Reposted by Venkat

kaijie-mo.bsky.social

@kaijie-mo.bsky.social

Hello world 👋
My first paper at UT Austin!

We ask: what happens when medical “evidence” fed into an LLM is wrong? Should your AI stay faithful, or should it play it safe when the evidence is harmful?

We show that frontier LLMs accept counterfactual medical evidence at face value.🧵

January 21, 2026 at 6:45 PM

Venkat

@venkatasg.net

Update: GPT-5.2 Pro aces my standard and modified FizzBuzz benchmark. Most models still fail to generalize (spectacularly), but something did change with the latest crop of Claude and GPT thinking/pro models that seemed to help with my (silly, but interesting) benchmark.
github.com/venkatasg/fi...

2D bar chart showing how many turns different LLMs take before failing to return the right answer in 100 turns of standard FizzBuzz (blue) and a slightly different setting where they should say Buzz at multiples of 7 rather than 5 (green).

January 17, 2026 at 9:01 AM

Venkat

@venkatasg.net

Thanks Claude, lucky for you I make regular backups!
I thought it'd be interesting to incorporate CLI agents in my software engineering class, but depending on my students (or anyone's) backup hygiene is a non-starter. Maybe Claude in remote environments...

Claude saying it 'accidentally' deleted my main source file and checking if there's backups.

January 5, 2026 at 10:39 AM

Venkat

@venkatasg.net

I knew StackOverflow was in trouble because of LLMs but this graph is insane. It took a decade for Wikipedia to push Encyclopedia Britannica out of print, but only three years for LLMs to make people stop asking questions on Stack Overflow.
data.stackexchange.com/stackoverflo...

Number of questions on StackOverflow over time. It dips sharply after 2022 when ChatGPT came out.

January 4, 2026 at 3:15 PM

Venkat

@venkatasg.net

A student found my personal number and started calling me on WhatsApp to increase their grade on the final😶 Reminding myself of the dumb stuff I did as an 18 year old to find the grace to gently email them that this is inappropriate.

December 24, 2025 at 4:42 AM

Reposted by Venkat

Naomi Saphra

@nsaphra.bsky.social

Omg wait. Someone literally posted this paper a couple weeks ago. Good job guys

Sparse Autoencoders are Topic Models

Sparse autoencoders (SAEs) are used to analyze embeddings, but their role and practical value are debated. We propose a new perspective on SAEs by demonstrating that they can be naturally understood a...

arxiv.org

December 15, 2025 at 11:00 PM

Venkat

@venkatasg.net

Following up on my blog post about, I figured I'd create a silly benchmark to test how good LLMs are at playing FizzBuzz for 100 turns. Surprisingly 2 Claude models do well at both the standard game and a slightly modified game where ‘buzz’ should be emitted at multiples of 7 rather than 5… (1/2)

2D bar chart showing how many turns different LLMs take before failing to return the right answer in 100 turns of FizzBuzz and a slightly different setting where they should say Buzz at multiples of 7 rather than 5.

December 13, 2025 at 8:28 PM

Venkat

@venkatasg.net

I was thinking about FizzBuzz and LLMs writing code and had a few thoughts. venkatasg.net/blog/fizzbuzz-2...

FizzBuzzing LLMs

venkatasg.net

December 13, 2025 at 5:21 PM

Reposted by Venkat

Kaj Bostrom

@bostromk.net

Presenting a poster with some independent work on dynamic neural audio at 3pm at the AI for Music workshop (room 27)! bostromk.net/ASURA

Asura's Harp

bostromk.net

December 7, 2025 at 7:17 PM

Reposted by Venkat

Leonie Weissweiler

@weissweiler.bsky.social

🥳Life Update!

I’m thrilled to share that I’ll be starting as assistant professor for Natural Language Processing @unileipzig.bsky.social in April! I’m deeply grateful to everyone who supported me on this journey.

I will be recruiting PhD students with @scadsai.bsky.social, stay tuned for details!

December 10, 2025 at 1:10 PM

Reposted by Venkat

Sasha Boguraev

@sashaboguraev.bsky.social

We are accepting submissions for the 25th edition of the Texas Linguistics Society (TLS), a UT Austin grad-student ran Linguistics conference! The conference will run from February 20 - 21, 2026 in Austin.

Abstract Deadline: December 17
Notification: January 15

November 21, 2025 at 9:17 PM

Reposted by Venkat

Jennifer Hu

@jennhu.bsky.social

New work to appear @ TACL!

Language models (LMs) are remarkably good at generating novel well-formed sentences, leading to claims that they have mastered grammar.

Yet they often assign higher probability to ungrammatical strings than to grammatical strings.

How can both things be true? 🧵👇

Screenshot of a figure with two panels, labeled (a) and (b). The caption reads: "Figure 1: (a) Illustration of messages (left) and strings (right) in toy domain. Blue = grammatical strings. Red = ungrammatical strings. (b) Surprisal (negative log probability) assigned to toy strings by GPT-2."

November 10, 2025 at 10:11 PM

Reposted by Venkat

Chantal

@chantalsh.bsky.social

Syntax that spuriously correlates with safe domains can jailbreak LLMs - e.g. below with GPT4o mini

Our paper (co w/ Vinith Suriyakumar) on syntax-domain spurious correlations will appear at #NeurIPS2025 as a ✨spotlight!

+ @marzyehghassemi.bsky.social, @byron.bsky.social, Levent Sagun

October 24, 2025 at 4:23 PM

Reposted by Venkat

Kanishka Misra

@kanishka.bsky.social

"Although I hate leafy vegetables, I prefer daxes to blickets." Can you tell if daxes are leafy vegetables? LM's can't seem to! 📷

We investigate if LMs capture these inferences from connectives when they cannot rely on world knowledge.

New paper w/ Daniel, Will, @jessyjli.bsky.social

Title page of the paper: WUGNECTIVES: Novel Entity Inferences of Language Models from Discourse Connectives, with two figures at the bottom

Left: Our figure 1 -- comparing previous work, which usually predicted the connective given the arguments (grounded in the world); our work flips this premise by getting models to use their knowledge of connectives to predict something about the world.

Right: Our main results across 7 types of connective senses. Models are especially bad at Concession connectives.

October 16, 2025 at 3:27 PM

Reposted by Venkat

Kyle Mahowald

@kmahowald.bsky.social

UT Austin Linguistics is hiring in computational linguistics!

Asst or Assoc.

We have a thriving group sites.utexas.edu/compling/ and a long proud history in the space. (For instance, fun fact, Jeff Elman was a UT Austin Linguistics Ph.D.)

faculty.utexas.edu/career/170793

🤘

UT Austin Computational Linguistics Research Group – Humans processing computers processing humans processing language

sites.utexas.edu

October 7, 2025 at 8:53 PM

Reposted by Venkat

Juan Diego Rodriguez

@juand-r.bsky.social

Excited to present this at #COLM2025 tomorrow! (Tuesday, 11:00 AM poster session)

Juan Diego Rodriguez @juand-r.bsky.social · Apr 16

One of the ways that LLMs can be inconsistent is the "generator-validator gap," where LLMs deem their own answers incorrect.

🎯 We demonstrate that ranking-based discriminator training can significantly reduce this gap, and improvements on one task often generalize to others!

🧵👇

A visualization of the generator-validator gap, where the LM likelihoods of for the generator and discriminator forms of questions are poorly correlated.

Aligning the validator and generator rankings can fix it!

October 6, 2025 at 8:40 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news