Lightnews — Scholar-powered news

Ece Takmaz

@ecekt.bsky.social

I hope our findings would be helpful for the future contributors to the multimodal track of the BabyLM challenge! aclanthology.org/2025.babylm-...

Model Merging to Maintain Language-Only Performance in Developmentally Plausible Multimodal Models

Ece Takmaz, Lisa Bylinina, Jakub Dotlacil. Proceedings of the First BabyLM Workshop. 2025.

aclanthology.org

November 1, 2025 at 3:52 PM

Ece Takmaz

@ecekt.bsky.social

Instead of using the data provided in the BabyLM challenge, I opted for obtaining them from their sources, which added extra layers of filtering and complexity, revealing some discrepancies in the multimodal BabyLM data. I mention these in the paper.

November 1, 2025 at 3:52 PM

Ece Takmaz

@ecekt.bsky.social

Unfortunately, we had limited time and resources to modify the whole evaluation pipeline for our specific multimodal architecture. As a result, we tested our models on a subset of the benchmarks.

November 1, 2025 at 3:52 PM

Ece Takmaz

@ecekt.bsky.social

The report on the Findings of the Third BabyLM Challenge indicates that the multimodal track received only 1 full submission this year. We submitted our paper to the workshop track instead of the challenge.

November 1, 2025 at 3:52 PM

Ece Takmaz

@ecekt.bsky.social

We experiment with weighted linear interpolation of language-only and multimodal model weights. Model merging with language-only checkpoints helps alleviate the issue to some extent, benefiting performance in language-only benchmarks and not disrupting accuracy in multimodal tasks heavily.

November 1, 2025 at 3:52 PM

Ece Takmaz

@ecekt.bsky.social

How can we mitigate this issue in developmentally plausible multimodal models and maintain language-only performance? We explored model merging, a technique that has been shown to benefit multi-task and multi-language models, reducing the effects of catastrophic forgetting.

November 1, 2025 at 3:52 PM

Ece Takmaz

@ecekt.bsky.social

Our multimodal BabyLM model surpasses previous multimodal baselines and submissions on the leaderboard. Yet, compared to language-only models, it underperforms in grammar-oriented benchmarks, although being exposed to the same language-only data as the language-only models (+ multimodal data).

November 1, 2025 at 3:52 PM

Ece Takmaz

@ecekt.bsky.social

Previous work, including BabyLM contributions, indicates that multimodal data has limited or no benefits in text-only benchmarks. We reach similar conclusions in our low-resource multimodal scenario.

November 1, 2025 at 3:52 PM

Ece Takmaz

@ecekt.bsky.social

I felt very much at home at #ICCV2025! Here is the paper: arxiv.org/abs/2509.01453

Traces of Image Memorability in Vision Encoders: Activations, Attention Distributions and Autoencoder Losses

Images vary in how memorable they are to humans. Inspired by findings from cognitive science and computer vision, this paper explores the correlates of image memorability in pretrained vision encoders...

arxiv.org

October 27, 2025 at 9:13 PM

Ece Takmaz

@ecekt.bsky.social

I will be presenting this work at the @iccv.bsky.social
2025 workshop MemVis: The 1st Workshop on Memory and Vision! 🌺 Work done with Albert Gatt & Jakub Dotlacil arxiv.org/abs/2509.01453

Traces of Image Memorability in Vision Encoders: Activations, Attention Distributions and Autoencoder Losses

Images vary in how memorable they are to humans. Inspired by findings from cognitive science and computer vision, this paper explores the correlates of image memorability in pretrained vision encoders...

arxiv.org

October 15, 2025 at 9:10 AM