Lightnews — Scholar-powered news

Philip Bontrager

@pbontrager.bsky.social

760 followers 960 following 180 posts

AI researcher & engineer @Meta working on @PyTorch torchtune in NYC; interests in generative models, RL, and evolutionary strategies

💻 https://github.com/pbontrager 📝 https://tinyurl.com/philips-papers

Posts Replies Media Videos

Philip Bontrager

@pbontrager.bsky.social

This thread is a bit long, but I thought it’d be interesting to share just one of the mundane parts of the deep learning stack that break and have to be rethought as models and training scale.

June 8, 2025 at 12:07 AM

Philip Bontrager

@pbontrager.bsky.social

To save, you need to let each GPU save their own partial safetensors, because communication is slow, and then line up the memory blocks and merge into one file.

June 8, 2025 at 12:07 AM

Philip Bontrager

@pbontrager.bsky.social

Safetensors are great for hosting checkpoints and make no assumptions about if your model is distributed by saving full unshared parameters. To work natively with safetensors, DCP needs to tell each GPU the exact slice of data to read without loading the full parameter.

June 8, 2025 at 12:07 AM

Philip Bontrager

@pbontrager.bsky.social

On startup, DCP has to map your old GPU layout to your new one so each GPU knows which file to read from and only read the data they need. But there’s one last problem; when you’re ready to take your model to another tool (serving, eval, etc), it expects safetenors checkpoints.

June 8, 2025 at 12:07 AM

Philip Bontrager

@pbontrager.bsky.social

Distributed Checkpoints (DCP) solve this by having every GPU save their own checkpoint asynchronously so you can save a checkpoint in less than a second. But this creates a new problem, the next time you want to use the model, you might have a different number of GPUs.

June 8, 2025 at 12:07 AM

Philip Bontrager

@pbontrager.bsky.social

I’m enjoying it while it lasts before everything fully homogenizes again

February 26, 2025 at 2:04 AM

Philip Bontrager

@pbontrager.bsky.social

Aren’t these two paradoxes functionally the same? en.m.wikipedia.org/wiki/Braess%...

Braess's paradox - Wikipedia

en.m.wikipedia.org

January 27, 2025 at 8:54 AM

Philip Bontrager

@pbontrager.bsky.social

Original post here: x.com/jjitsev/stat...

x.com

January 25, 2025 at 6:07 PM

Philip Bontrager

@pbontrager.bsky.social

What are the best benchmarks for reasoning models?

January 20, 2025 at 10:32 AM

Philip Bontrager

@pbontrager.bsky.social

Haha, that wasn’t lost on me. Facebook’s still going strong, but it’s a different site and users from when I was in HS.

January 13, 2025 at 9:14 PM

Philip Bontrager

@pbontrager.bsky.social

If you can choose who follows you, that sounds more like “friends” from the old Facebook days.

January 13, 2025 at 8:53 PM

Philip Bontrager

@pbontrager.bsky.social

I found out about Warp because I was on jury duty with one of their devs 😂 It’s been great compared to the Mac’s default terminal.

January 7, 2025 at 11:58 PM

Philip Bontrager

@pbontrager.bsky.social

How do you add these?

January 7, 2025 at 4:10 PM

Philip Bontrager

@pbontrager.bsky.social

Maybe let’s go the other direction and include blog posts in CVs too.

January 7, 2025 at 3:30 PM

Philip Bontrager

@pbontrager.bsky.social

That would imply that we solved self-driving (image recognition) and search (language understanding), among other things.

January 7, 2025 at 2:33 AM

Philip Bontrager

@pbontrager.bsky.social

This could be a good case for mixed models. The model parsing the text could likely be smaller or be fairly cheap like DeepSeek

January 4, 2025 at 9:45 PM

Philip Bontrager

@pbontrager.bsky.social

Thankfully in a small startup you only have to sell an idea to a couple of people and you can get going.

January 3, 2025 at 8:34 PM

Philip Bontrager

@pbontrager.bsky.social

One startup I joined had a model getting 95% on benchmarks but terrible in practice. Spent the first 6 months developing new benchmarks instead of a new model.

January 3, 2025 at 1:30 AM

Philip Bontrager

@pbontrager.bsky.social

I always set out to propose a new idea and end up having to proposing a new benchmark instead

January 3, 2025 at 12:31 AM

Philip Bontrager

@pbontrager.bsky.social

What if humanity knows X and wants to understand Z. If a computer can give us Y so that we can understand Z, that would be useful for science. Though I’d say that we still didn’t know Y ourselves yet.

January 3, 2025 at 12:26 AM

Philip Bontrager

@pbontrager.bsky.social

Imagine if under the hood O1 is just calling “write better code” over and over again 😂

January 3, 2025 at 12:14 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news