Lightnews — Scholar-powered news

Sebastian Dziadzio

@dziadzio.bsky.social

Yeah, mostly because GPT-5 needs to think for 20 seconds to come up with a name for a variable. It's good for bigger, self-contained features, but the bias for "reasoning" in the model router makes it downright unusable for smaller changes.

September 16, 2025 at 11:01 AM

Sebastian Dziadzio

@dziadzio.bsky.social

Done! Sorry for the wait

May 15, 2025 at 1:37 PM

Sebastian Dziadzio

@dziadzio.bsky.social

Added! 🎟️

February 10, 2025 at 10:08 AM

Sebastian Dziadzio

@dziadzio.bsky.social

Done! 🙌🏻

January 15, 2025 at 4:27 PM

Sebastian Dziadzio

@dziadzio.bsky.social

Done! ✅

December 17, 2024 at 1:45 PM

Sebastian Dziadzio

@dziadzio.bsky.social

📄 Paper: arxiv.org/abs/2412.06712
💻 Code: github.com/ExplainableM...

GitHub - ExplainableML/fomo_in_flux: Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]

Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24] - ExplainableML/fomo_in_flux

github.com

December 11, 2024 at 6:00 PM

Sebastian Dziadzio

@dziadzio.bsky.social

This has been a fun project with a great team: led by @vishaalurao.bsky.social and @confusezius.bsky.social, with core contributions from @bayesiankitten.bsky.social, and supervision by @zeynepakata.bsky.social, Samuel Albanie, and Matthias Bethge.

December 11, 2024 at 6:00 PM

Sebastian Dziadzio

@dziadzio.bsky.social

As usual, scaling matters!
🚀 Larger models benefit more from temporal merging than sequential finetuning.
🚀 Larger compute budgets allow temporal merging to match (and surpass!) multitask performance.
🚀 Best-in-TIME scales effectively across longer task sequences (50, 100).

Plots showing the scaling dynamics described in the text.

December 11, 2024 at 6:00 PM

Sebastian Dziadzio

@dziadzio.bsky.social

📌 The choice of merging technique doesn’t matter much.

In the temporal setting, complex merging techniques like TIES or Breadcrumbs offer only marginal gains compared to simpler ones like weight averaging.

A plot showing that different merging techniques perform similarly.

December 11, 2024 at 6:00 PM

Sebastian Dziadzio

@dziadzio.bsky.social

📌 Initialization and deployment choices are crucial.

One strategy stands out—using exponential moving average for both initialization and deployment strikes the best balance between knowledge accumulation and zero-shot retention. We call this approach ✨Best-in-TIME✨

A plot showing that different initialization and deployment strategies lead to different results.

December 11, 2024 at 6:00 PM

Sebastian Dziadzio

@dziadzio.bsky.social

📌 Accounting for time is essential.

Standard merging struggles with the temporal dynamics. Replay and weighting schemes, which factor in the sequential nature of the problem, help (but only to a point).

A plot showing that offline merging underperforms with respect to a replay baseline.

December 11, 2024 at 6:00 PM

Sebastian Dziadzio

@dziadzio.bsky.social

Key insights:

📌 Accounting for time is essential.
📌 Initialization and deployment choices are crucial.
📌 The choice of merging technique doesn’t matter much.

December 11, 2024 at 6:00 PM

Sebastian Dziadzio

@dziadzio.bsky.social

The world keeps changing, and so should our models.

Enter TIME (Temporal Integration of Model Expertise), a unifying approach that considers:

1️⃣ Initialization
2️⃣ Deployment
3️⃣ Merging Techniques

We study these three axes on the large FoMo-in-Flux benchmark.

A schematic representation of the TIME framework.

December 11, 2024 at 6:00 PM

Sebastian Dziadzio

@dziadzio.bsky.social

I keep forgetting about the concert, yesterday I was like 'wow people in Vancouver sure love sequins and cowboy boots'.

December 9, 2024 at 2:18 AM

Sebastian Dziadzio

@dziadzio.bsky.social

Whenever my "papers" tab group got lost in a chrome crash I felt nothing but relief.

The firehose is relentless, so over time my strategy became to skim in the moment if interesting and save to zotero, otherwise close the tab. There is only the present. Important stuff will come back.

November 30, 2024 at 1:19 PM

Sebastian Dziadzio

@dziadzio.bsky.social

Yeah, I think we consistently underestimate how much stuff is out there on the Internet. You might think your question or image prompt is niche and original, but if you consider the distribution of Internet-scale datasets, you'd have to work very hard to even reach the tail.

November 30, 2024 at 12:59 PM

Sebastian Dziadzio

@dziadzio.bsky.social

If someone said "the algorithm" with no additional context, I'd think of the latter, but "an algorithm" for me is still the former. Interesting how the default meaning is shifting.

November 30, 2024 at 12:42 PM

Sebastian Dziadzio

@dziadzio.bsky.social

Have you read Fables for Robots? I think it was only published in English as part of Mortal Engines. If you liked Cyberiad, you'll like this one too!

November 29, 2024 at 10:27 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news