Lightnews — Scholar-powered news

Matteo Saponati

@matteosaponati.bsky.social

I am a research scientist in Machine Learning and Neuroscience. I am fascinated by life and intelligence, and I like to study complex systems. I love to play music and dance.

Postdoctoral Research Scientist @ ETH Zürich

↳ https://matteosaponati.github.io

Posts Replies Media Videos

Matteo Saponati

@matteosaponati.bsky.social

5/ Finally, we leveraged symmetry to improve Transformer training.

- Initializing self-attention matrices symmetrically improves training efficiency for bidirectional models, leading to faster convergence.

This suggest that imposing structures at initialization can enhance training dynamics.

⬇️

February 18, 2025 at 12:22 PM

Matteo Saponati

@matteosaponati.bsky.social

4/ We validate our analysis empirically showing that these patterns consistently emerge different language models and input modalities such as text, vision, and audio models.

- ModernBERT, GPT, LLaMA3, Mistral, etc
- Text, vision, and audio models
- Different model sizes, and architectures

⬇️

February 18, 2025 at 12:22 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news