Lightnews — Scholar-powered news

Malikeh Ehghaghi

@malikeh97.bsky.social

Machine Learning Research Scientist, MScAC | ML Researcher at Vector Institute | Co-Host of Women in AI Research (WiAIR) Podcast | 5+ Years of Industry Experience | University of Toronto Alumnus

Posts Replies Media Videos

Malikeh Ehghaghi

@malikeh97.bsky.social

Next week at NeurIPS 2025 in San Diego, we’re bringing this story to the community. Join us if you are interested. See you soon💜

November 27, 2025 at 2:43 PM

Malikeh Ehghaghi

@malikeh97.bsky.social

Despite its simplicity and scalability, model merging is still underappreciated.
But it fundamentally challenges how we think about:
🧠 training
🧩 specialization
🌐 generalization
It's one of the few ideas that redefine what "training a model" can mean.

November 27, 2025 at 2:43 PM

Malikeh Ehghaghi

@malikeh97.bsky.social

That sparked a deep dive through merging papers:
🔹 Git Re-Basin
🔹 TIES
🔹 DARE-TIES
🔹 Fisher-weighted averaging
… and later working on MergeKit, the elegant implementation by @chargoddard.bsky.social.

November 27, 2025 at 2:43 PM

Malikeh Ehghaghi

@malikeh97.bsky.social

My first encounter with this idea was back in Jan 2024 at
Arcee AI, when my colleague said:
"Have you heard of model merging?
Imagine no GPUs… just weights… and boom — a multitasker."
It truly felt like magic.

November 27, 2025 at 2:43 PM

Malikeh Ehghaghi

@malikeh97.bsky.social

But model merging changes the story.
With a simple interpolation in weight space:
✨ no retraining
✨ no shared pipeline
✨ no GPU time
You can fuse their strengths into a single multitask model.
A collaboration that survives time zones.

November 27, 2025 at 2:43 PM

Malikeh Ehghaghi

@malikeh97.bsky.social

Ever wondered what happens when experts train models across the world 🌍:
🇯🇵 multilingual in Tokyo
🇨🇦 coding in Toronto
🇩🇪 math in Berlin
…all on their own schedules, no overlap, no coordination?
In classic deep learning, these stay 3 separate models forever.

November 27, 2025 at 2:43 PM