But it fundamentally challenges how we think about:
🧠 training
🧩 specialization
🌐 generalization
It's one of the few ideas that redefine what "training a model" can mean.
But it fundamentally challenges how we think about:
🧠 training
🧩 specialization
🌐 generalization
It's one of the few ideas that redefine what "training a model" can mean.
🔹 Git Re-Basin
🔹 TIES
🔹 DARE-TIES
🔹 Fisher-weighted averaging
… and later working on MergeKit, the elegant implementation by @chargoddard.bsky.social.
🔹 Git Re-Basin
🔹 TIES
🔹 DARE-TIES
🔹 Fisher-weighted averaging
… and later working on MergeKit, the elegant implementation by @chargoddard.bsky.social.
Arcee AI, when my colleague said:
"Have you heard of model merging?
Imagine no GPUs… just weights… and boom — a multitasker."
It truly felt like magic.
Arcee AI, when my colleague said:
"Have you heard of model merging?
Imagine no GPUs… just weights… and boom — a multitasker."
It truly felt like magic.
With a simple interpolation in weight space:
✨ no retraining
✨ no shared pipeline
✨ no GPU time
You can fuse their strengths into a single multitask model.
A collaboration that survives time zones.
With a simple interpolation in weight space:
✨ no retraining
✨ no shared pipeline
✨ no GPU time
You can fuse their strengths into a single multitask model.
A collaboration that survives time zones.
🇯🇵 multilingual in Tokyo
🇨🇦 coding in Toronto
🇩🇪 math in Berlin
…all on their own schedules, no overlap, no coordination?
In classic deep learning, these stay 3 separate models forever.
🇯🇵 multilingual in Tokyo
🇨🇦 coding in Toronto
🇩🇪 math in Berlin
…all on their own schedules, no overlap, no coordination?
In classic deep learning, these stay 3 separate models forever.