Eleonora Grassucci
eleonoragrassucci.bsky.social
Eleonora Grassucci
@eleonoragrassucci.bsky.social
Assistant Professor @Sapienza, Rome.
Generative AI, Multimodal Learning, Generative Semantic Communication
Great work together with @GiordanoCicchetti
(first author), @luigi_sigillo and @dacom.bsky.social from ISPAMM Lab
December 18, 2024 at 3:43 PM
Play with the alignment at: ispamm.github.io/GRAM/

5/n🧵
GRAM: Gramian Multimodal Representation Learning and Alignment
Gramian Multimodal Representation Learning and Alignment
ispamm.github.io
December 18, 2024 at 3:43 PM
🤌Why?
1⃣GRAM aligns all the modalities altogether, mathematically proving the alignment of multiple modalities.
2⃣GRAM is proven to work from 2 up to n modalities!
3⃣GRAM established a new SOTA in downstream tasks!
4⃣No need to scale up model parameters!

4/n🧵
December 18, 2024 at 3:43 PM
🤔How?
1⃣Extract embeddings with modality encoders
2⃣Arrange them in a tensor
3⃣Compute the Gram matrix
4⃣Compute the determinant, and here it is the volume of the parallelotope!

3/n🧵
December 18, 2024 at 3:43 PM
💡The intuition is: semantically aligned data has a small volume, while semantically misaligned data has a large volume!
We do not need to get the pairwise cosine similarity anymore, which is insufficient for tasks that require cross-modal understanding beyond pairs!

2/n🧵
December 18, 2024 at 3:43 PM