Lightnews — Scholar-powered news

galen

@nel.ag

new blogpost out :)

si.inc @si.inc · Oct 1

We built a 30 PB storage cluster in the heart of SF.

Check out the writeup!—
si.inc/posts/the-heap/

Building the heap: racking 30 petabytes of hard drives for pretraining

How we spent under half a million dollars to build a 30 petabyte data storage cluster in downtown San Francisco

si.inc

October 1, 2025 at 3:28 PM

Reposted by galen

METR

@metr.org

We tested how autonomous AI agents perform on real software tasks from our recent developer productivity RCT.

We found a gap between algorithmic scoring and real-world usability that may help explain why AI benchmarks feel disconnected from reality.

August 13, 2025 at 10:38 PM

galen

@nel.ag

@jay.bsky.team pls fix
@devanshpanda.com

August 13, 2025 at 10:26 AM

Reposted by galen

METR

@metr.org

In a new report, we evaluate whether GPT-5 poses significant catastrophic risks via AI R&D acceleration, rogue replication, or sabotage of AI labs.

We conclude that this seems unlikely. However, capability trends continue rapidly, and models display increasing eval awareness.

August 8, 2025 at 1:20 AM

galen

@nel.ag

the model doesn’t self-correct after 3 pages of pushback, but solves it fine with a slightly different prompt. the brittleness here is obviously not just a tokenizer thing.

Quantian @quantian.bsky.social · Aug 8

We have a lot of fun tripping up AI with this, but asking it to parse a word by individual letters is kind of a nonsensical question given how tokenizers operate. It's like asking a Chinese speaker how many G's are in 中国, that's not how they process language.

August 8, 2025 at 12:48 PM

Reposted by galen

METR

@metr.org

We have open-sourced anonymized data and core analysis code for our developer productivity RCT.

The paper is also live on arXiv, with two new sections: One discussing alternative uncertainty estimation methods, and a new 'bias from developer recruitment' factor that has unclear effect on slowdown.

July 30, 2025 at 8:10 PM

galen

@nel.ag

I'd really like to see a tool built around doing many parallel generations, esp with unit tests. Seems like a big strength of llms that's totally missing from the mainstream stuff

eva (^_^)/ @eva.computer · Jul 8

what else are people using btw?
i’ve only ever used copilot in vs code and 95% of what I hear about is cursor

any recs for AI code tools that are cool or weird or interesting or ones people are sleeping on?

dame @dame.is · Jul 5

“Firstly, the era of VC-subsidized tokens may be coming to an end, especially for products like Cursor which are way past demonstrating product-market fit.”

good analysis from @simonwillison.net on cursor’s confusing pricing/usage changes

July 8, 2025 at 4:05 PM

galen

@nel.ag

checking if bsky uses POST or PUT requests for putting posts in the database

June 19, 2025 at 9:16 PM

galen

@nel.ag

new jepa paper!
ai.meta.com/research/pub...

ai.meta.com

June 11, 2025 at 5:29 PM

galen

@nel.ag

experiments should be tracked as a relational graph

June 1, 2025 at 10:55 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news