Malikeh Ehghaghi
banner
malikeh97.bsky.social
Malikeh Ehghaghi
@malikeh97.bsky.social
Machine Learning Research Scientist, MScAC | ML Researcher at Vector Institute | Co-Host of Women in AI Research (WiAIR) Podcast | 5+ Years of Industry Experience | University of Toronto Alumnus
Next week at NeurIPS 2025 in San Diego, we’re bringing this story to the community. Join us if you are interested. See you soon💜
November 27, 2025 at 2:43 PM
Despite its simplicity and scalability, model merging is still underappreciated.
But it fundamentally challenges how we think about:
🧠 training
🧩 specialization
🌐 generalization
It's one of the few ideas that redefine what "training a model" can mean.
November 27, 2025 at 2:43 PM
That sparked a deep dive through merging papers:
🔹 Git Re-Basin
🔹 TIES
🔹 DARE-TIES
🔹 Fisher-weighted averaging
… and later working on MergeKit, the elegant implementation by @chargoddard.bsky.social.
November 27, 2025 at 2:43 PM
My first encounter with this idea was back in Jan 2024 at
Arcee AI, when my colleague said:
"Have you heard of model merging?
Imagine no GPUs… just weights… and boom — a multitasker."
It truly felt like magic.
November 27, 2025 at 2:43 PM
But model merging changes the story.
With a simple interpolation in weight space:
✨ no retraining
✨ no shared pipeline
✨ no GPU time
You can fuse their strengths into a single multitask model.
A collaboration that survives time zones.
November 27, 2025 at 2:43 PM
Ever wondered what happens when experts train models across the world 🌍:
🇯🇵 multilingual in Tokyo
🇨🇦 coding in Toronto
🇩🇪 math in Berlin
…all on their own schedules, no overlap, no coordination?
In classic deep learning, these stay 3 separate models forever.
November 27, 2025 at 2:43 PM
Another Relevant Paper: arxiv.org/pdf/2403.03874
arxiv.org
April 14, 2025 at 6:22 PM