Daniel Marczak
dmarczak.bsky.social
Daniel Marczak
@dmarczak.bsky.social
mostly trying to merge models | phd student @ warsaw university of technology & ideas
Check out the paper & code for all the details!
📝 Paper: arxiv.org/abs/2502.04959
💻 Code: github.com/danielm1405/...

Huge thanks to my amazing collaborators:
Simone Magistri, Sebastian Cygert, Bartłomiej Twardowski, Andrew D. Bagdanov, Joost van de Weijer
February 10, 2025 at 2:47 PM
In summary: By using a uniform singular value spectrum 📊 and task-specific subspaces 🎯, Iso-CTS achieves state-of-the-art performance across all settings!🔥
February 10, 2025 at 2:47 PM
🔍 That’s why we propose replacing the least important components with task-specific vectors that are orthogonal to the common subspace.

This further enhances alignment 🎯, and the performance naturally improves! 📈
February 10, 2025 at 2:47 PM
This simple modification boosts task arithmetic by 📈 10-15% across all model merging scenarios, achieving state-of-the-art results in most cases!🔥

However, we found that the bottom components contribute very little to the final performance… 📉⚠️
February 10, 2025 at 2:47 PM
Based on this, we propose an isotropic merging framework that:
📊 Flattens the singular value spectrum of task matrices
🎯 Enhances alignment between tasks
⚖️ Reduces the perf gap
Surprisingly, the best performance is achieved when the singular value spectrum is uniform!🚀
February 10, 2025 at 2:47 PM
We show that alignment between singular components of task-specific & merged matrices strongly correlates with performance gains over the pre-trained model! 📈

🔍 Tasks that are well-aligned get amplified 🔊, while less aligned ones become underrepresented and struggle. 😬📉
February 10, 2025 at 2:47 PM