Lightnews — Scholar-powered news

Max Reith

@maxreith.bsky.social

Gemini-3 got this wrong 5/5 times...
(But this might just be reduced reasoning budgets at launch or something)

Another DeepSeek moment? Moonshot AI, a Chinese lab, released its new (open source!) model K2 Thinking, outperforming OpenAI et al. on several benchmarks. I tested it with a question from an unpublished paper of mine. Out of 5 tries, Kimi, GPT-5 and Gemini 2.5 Pro each replied correctly 3 times!

November 18, 2025 at 9:59 PM

Reposted by Max Reith

Sarah Cohodes

@cohodes.bsky.social

📣 New NBER Working Paper out today 📣

"The Consequences of Faculty Sexual Misconduct"
Sarah Cohodes & Katherine Leu

Screenshot of working paper: The Consequences of Faculty Sexual Misconduct

November 10, 2025 at 1:49 PM

Reposted by Max Reith

Scott Lincicome

@scottlincicome.bsky.social

New @nberpubs: "The Economic Impact of Brexit" www.nber.org/papers/w34459
"by 2025, Brexit had reduced UK GDP by 6% to 8%, with the impact accumulating gradually over time." 😲

November 10, 2025 at 11:45 AM

Max Reith

@maxreith.bsky.social

Another DeepSeek moment? Moonshot AI, a Chinese lab, released its new (open source!) model K2 Thinking, outperforming OpenAI et al. on several benchmarks. I tested it with a question from an unpublished paper of mine. Out of 5 tries, Kimi, GPT-5 and Gemini 2.5 Pro each replied correctly 3 times!

November 8, 2025 at 2:59 PM

Reposted by Max Reith

dame

@dame.is

chat, is this good?

I scored 67 on the AI purity test.

post your scores:
https://aipuritytest.org

AI Purity Test
The AI Purity Test is a voluntary self-assessment developed by Tina Tarighian. It provides participants with a structured opportunity to reflect on the evolution of their interactions with artificial intelligence over time.
Caution: this is not a bucket list. Completion of all items on this test will likely result in death.

Your score:
67

October 24, 2025 at 2:00 PM

Reposted by Max Reith

Aaron Roth

@aaroth.bsky.social

An interesting debate between Emily Bender and Sebastien Bubeck: www.youtube.com/watch?v=YtIQ... ---Emily's thesis is roughly summarized as: "LLMs extrude plausible sounding text, and the illusion of understanding comes entirely from how the listener's human mind interprets language. "

CHM Live | The Great Chatbot Debate: Do LLMs Really Understand?

YouTube video by Computer History Museum

www.youtube.com

October 21, 2025 at 3:33 PM

Reposted by Max Reith

Rudi Bachmann

@bachmannrudi.bsky.social

Dieses Streitgespräch zwischen @clemensfuest.bsky.social und @suedekum.bsky.social in der @zeit.de sollte man in Vorlesungen und Proseminaren zur Theorie der Wirtschaftspolitik durchnehmen. Sehr gutes Lehrmaterial, for the good and the bad. Ein 🧵:

October 18, 2025 at 9:31 AM

Reposted by Max Reith

Scott McGrath

@smcgrath.phd

🧪 A new computer science conference, Agents4Science, will feature papers written and peer-reviewed entirely by AI agents. The event serves as a sandbox to evaluate the quality of machine-generated research and its review process.
#MLSky

AI bots wrote and reviewed all papers at this conference

Event will assess how reviews by models compare with those written by humans.

www.nature.com

October 15, 2025 at 3:33 PM

Reposted by Max Reith

Jennifer Doleac

@jenniferdoleac.bsky.social

I’ve decided not to post my annual “women on the Econ job market” thread this year. Social media has splintered too much, and now that I’ve left academia I’m focused on other priorities.

October 14, 2025 at 2:02 PM

Reposted by Max Reith

Mauricio Drelichman

@mdrelichman.bsky.social

Elated at Joel Mokyr's Nobel Prize! You can find numerous accounts -now multiplying by the minute- of his scholarly contributions. Today I want to celebrate the man and the mentor.

Joel Mokyr at the 2011 conference in his honour at Northwestern.

October 13, 2025 at 6:00 PM

Reposted by Max Reith

Ethan Mollick

@emollick.bsky.social

I don't think people have updated enough on the capability gain in LLMs, which (despite being bad at math a year ago) now dominate hard STEM contests: gold medals in the International Math Olympiad, the International Olympiad on Astronomy & Astrophysics, International Informatics Olympiad...

October 12, 2025 at 8:40 PM

Reposted by Max Reith

Scott McGrath

@smcgrath.phd

Sora hit 1M downloads faster than ChatGPT
#MLSky
techcrunch.com/2025/10/09/s...

Sora hit 1M downloads faster than ChatGPT | TechCrunch

This level of consumer adoption is worth noting because Sora remains an invite-only app, while ChatGPT was more publicly available at launch. That makes Sora's performance more impressive.

techcrunch.com

October 10, 2025 at 2:30 PM

Reposted by Max Reith

Our World in Data

@ourworldindata.org

How over- and underrepresented are different causes of death in the media?

Another way to visualize this data is to measure how over- or underrepresented each cause is.

To do this, we calculate the ratio between a cause’s share of deaths and its share of news articles.

October 9, 2025 at 5:08 PM

Reposted by Max Reith

Jason Furman

@jasonfurman.bsky.social

The other day a student asked me about the prevalence of insider trading in prediction markets. I now have an answer.

October 10, 2025 at 11:19 AM

Reposted by Max Reith

Alexandra de Gendre

@adegendre.bsky.social

The best post I’ve seen on Bluesky in a very long time! Brilliant idea and brilliant accounts out there !

Conrad Hackett @conradhackett.bsky.social · Oct 1

What's your favorite Bluesky account that primarily posts about something other than current events/politics?

October 2, 2025 at 10:31 AM

Reposted by Max Reith

Joshua Gans

@joshgans.bsky.social

Back in graduate school, Paul Milgrom asked me to examine a published paper from 1984 by another person that he suspected had an incorrect proof. I found the error. I decided to see if LLMs could. Only Gemini 2.5 Pro did so. Claude Opus and GPT-5-pro found no significant errors.

September 30, 2025 at 6:58 PM

Max Reith

@maxreith.bsky.social

Do tech optimists have a point? Within standard economic growth models, AI could drive explosive growth through one of two mechanisms.

1) Labor Substitution
So far, it seems like capital and labor mostly complement each other, which limits the returns to additional capital given fixed labor.

September 19, 2025 at 9:35 AM

Reposted by Max Reith

Ethan Mollick

@emollick.bsky.social

A cautiously optimistic result on AI and disinformation.

A week before 2024 UK elections 13% of all voters used AI to ask about political topics. A randomized trial found this may be good: using AI led to similar gains in true knowledge as doing web research, regardless of model & prompt used.

September 18, 2025 at 8:15 PM

Reposted by Max Reith

Alexander Doria

@dorialexander.bsky.social

> be a language model
> all you see is tokens
> you don't care, it's all abstracted away
> you live for a world of pure ideas, chain of concepts, reasoning streams
> tokens don't exist.

September 15, 2025 at 4:50 PM

Reposted by Max Reith

Thomas Dietterich

@tdietterich.bsky.social

We need new rules for publishing AI-generated research. The teams developing automated AI scientists have customarily submitted their papers to standard refereed venues (journals and conferences) and to arXiv. Often, acceptance has been treated as the dependent variable. 1/

September 14, 2025 at 5:15 PM

Reposted by Max Reith

Ethan Mollick

@emollick.bsky.social

We are starting to see some nuanced discussions of what it means to work with advanced AI in its current state

In this case, GPT-5 Pro was able to do novel math, but only when guided by a math professor (though the paper also noted the speed of advance since GPT-4)

The reflection is worth reading.

September 6, 2025 at 9:55 PM

Reposted by Max Reith

Grace

@gracekind.net

Never ask a man his age, a woman her salary, or GPT-5 whether a seahorse emoji exists

September 6, 2025 at 1:08 PM

Reposted by Max Reith

Pekka Lund

@pekka.bsky.social

I like the way Anthropic approaches these questions.

"We remain highly uncertain about the potential moral status of Claude and other LLMs, now or in the future. However, we take the issue seriously...Allowing models to end or exit potentially distressing interactions is one such intervention"

Claude Opus 4 and 4.1 can now end a rare subset of conversations

An update on our exploratory research on model welfare

www.anthropic.com

August 16, 2025 at 4:49 PM

Max Reith

@maxreith.bsky.social

LLMs are getting better at long term reasoning. This is a big deal, and opens the door for LLMs to perform more tasks in the real world.

Pekka Lund @pekka.bsky.social · Aug 14

GPT-5 (Thinking medium) was tested on Vending-Bench. Second place after Grok 4. Third model to beat their human baseline. Said to be "huge improvement over o3".

They also tested GPT-5-mini, which "showed impressive long-term coherence" but "was less impressive in terms of net worth accumulated".

https://andonlabs.com/evals/vending-bench

August 14, 2025 at 1:46 PM

Reposted by Max Reith

Ethan Mollick

@emollick.bsky.social

Suddenly retiring every other model without warning was a weird move by OpenAI

… and they did it without explaining how switching models worked or even details of various GPT-5 models

…and they did it after many built workflows & training & assignments around older models, maybe breaking them. Odd

August 8, 2025 at 6:30 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news