Lightnews — Scholar-powered news

🤌Why?
1⃣GRAM aligns all the modalities altogether, mathematically proving the alignment of multiple modalities.
2⃣GRAM is proven to work from 2 up to n modalities!
3⃣GRAM established a new SOTA in downstream tasks!
4⃣No need to scale up model parameters!

4/n🧵

December 18, 2024 at 3:43 PM

Eleonora Grassucci

@eleonoragrassucci.bsky.social

🤔How?
1⃣Extract embeddings with modality encoders
2⃣Arrange them in a tensor
3⃣Compute the Gram matrix
4⃣Compute the determinant, and here it is the volume of the parallelotope!

3/n🧵

December 18, 2024 at 3:43 PM

Eleonora Grassucci

@eleonoragrassucci.bsky.social

💡The intuition is: semantically aligned data has a small volume, while semantically misaligned data has a large volume!
We do not need to get the pairwise cosine similarity anymore, which is insufficient for tasks that require cross-modal understanding beyond pairs!

2/n🧵

December 18, 2024 at 3:43 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news