Lightnews — Scholar-powered news

R. Keelan

@rkeelan.bsky.social

But there's just so obviously something there! I want people to take it seriously!

August 25, 2025 at 2:23 AM

R. Keelan

@rkeelan.bsky.social

To be clear, I do not love that AI is within spitting distance of being able to do so much white collar work. I am a programmer and I work remotely--my job is very much at risk. In many ways, it would be a relief for me if it turned out that AI was a bubble and there was actually nothing there.

August 25, 2025 at 2:23 AM

R. Keelan

@rkeelan.bsky.social

Or we'll spend a couple million writing custom tools and other infrastructure for the AI to use, rather than having it use stuff made for humans

August 25, 2025 at 2:23 AM

R. Keelan

@rkeelan.bsky.social

Or we'll pay a team of really smart people to spend a year breaking down as many tasks as possible into 5-minute units of work where the AI has better odds of success

August 25, 2025 at 2:23 AM

R. Keelan

@rkeelan.bsky.social

"Ha! The AIs fails half the time when they try to do even an hour's work at once!"

Okay, but it costs $1 and takes 5 minutes. Let's run 100 in parallel and pick the best.

August 25, 2025 at 2:23 AM

R. Keelan

@rkeelan.bsky.social

But even if the performance improvements ended right now, non-programmers (and even many programmers) are just vastly underestimating how much more there is that can be done (this is often called the "product overhang.")

August 25, 2025 at 2:23 AM

R. Keelan

@rkeelan.bsky.social

It's true that there's no guarantee that the trend will continue, but it doesn't have to continue much longer for AI to be able to be able to accomplish a very large amount of economically useful work on it's own.

August 25, 2025 at 2:23 AM

R. Keelan

@rkeelan.bsky.social

GPT-5 is on the same trend as GPT-2, -3, -3.5, and -4. A bit above trend, actually, and the trend is that task duration doubles every 7 months.

August 25, 2025 at 2:23 AM

R. Keelan

@rkeelan.bsky.social

METR's research measures the ability of models to complete tasks of various length with some odds of success. They measure GPT-5 as being being able to complete 1-hour tasks with 50% chance of success, or complete 6-minute tasks with 80% chance of success.

August 25, 2025 at 2:23 AM

R. Keelan

@rkeelan.bsky.social

"How to judge model performance" has been a bit of a moving target, so I wouldn't blame you for thinking proponents of AI were engaged in goal-post moving, but I think METR's research (metr.org/blog/2025-03...) on AI's ability to complete long tasks is the current-best way to judge it.

metr.org

August 25, 2025 at 2:23 AM

R. Keelan

@rkeelan.bsky.social

The correct sequence of comparisons is 2 vs 3, 3 vs 4, and 4 vs 5, in which case GPT-5 is exactly as impressive as it should be.

In other words, GPT-4o and o3 "ate" a bunch of the GPT4-to-5 improvement jump, which makes GPT-5 seem less impressive than it actually is

August 25, 2025 at 2:23 AM

R. Keelan

@rkeelan.bsky.social

But look at the timeline:

- GPT-3 was released June 2020
- Chat GPT was released 30 November, 2022
- GPT-4 was released March 14th, 2023
- GPT-4o was released May 13th, 2024
- o3 was released April 16th, 2025
- GPT-5 was released August 7th, 2025

August 25, 2025 at 2:23 AM

R. Keelan

@rkeelan.bsky.social

The disappointment with GPT-5 is mostly a mistake: people were expecting a "full GPT worth of improvement" (i.e., similar to the difference between GPT-2 and -3, or between -3 and -4) between o3 and GPT-5, and didn't get that

August 25, 2025 at 2:23 AM

R. Keelan

@rkeelan.bsky.social

I use both—Chrome for gmail, Google Maps, and general searching, then Edge for a variety of websites I regularly open for a specific purpose (e.g., banking, other bill payments, etc)

June 9, 2025 at 6:32 PM

R. Keelan

@rkeelan.bsky.social

Most of the best parts of Star Wars over the past 40 years come from the books, games, and TV shows. People who aren't fans aren't aware that stuff exists, so they have no idea why the fans have such affection for the franchise

April 29, 2025 at 10:59 PM

R. Keelan

@rkeelan.bsky.social

I thought that was where the clip was going!

April 7, 2025 at 9:41 PM

R. Keelan

@rkeelan.bsky.social

Modern LLMs have hundreds of billions of parameters (maybe even trillions by now). That's alot of space to represent a lot of concepts. No one should be confident that they know when LLM performance and abilities will plateau. 8/8

April 3, 2025 at 2:01 PM

R. Keelan

@rkeelan.bsky.social

For LLMs to write as coherently as they do on such a broad range of topics requires more than just knowledge of language, because language isn't precise enough.

"I saw a man in a park with a telescope."

Is the telescope in the park, or with the speaker? It's ambiguous. 7/n

April 3, 2025 at 2:01 PM

R. Keelan

@rkeelan.bsky.social

Here's another intuition pump: how well would you have to know someone in order to predict what they'd say in certain situations? This is possible—maybe you can do this for your spouse or children or siblings—but you need to know them *really* well. 6/n

April 3, 2025 at 2:01 PM

R. Keelan

@rkeelan.bsky.social

If you throw a ball in the air you can calculate how long it will take to hit the ground using simple math. But not just *any* math. You need the equations of motion. These aren't just random calculations. They are a model of the world encoding facts about reality. 5/n

April 3, 2025 at 2:01 PM

R. Keelan

@rkeelan.bsky.social

When I saw ChatGPT, it was obvious there was more going on. This is the kind of thing that was going on. Training to predict the next word resulted in the LLMs building increasingly detailed, comprehensive, and accurate models of the world. 4/n
transformer-circuits.pub/2025/attribu...

transformer-circuits.pub

April 3, 2025 at 2:01 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news