Eleonora Grassucci
eleonoragrassucci.bsky.social
Eleonora Grassucci
@eleonoragrassucci.bsky.social
Assistant Professor @Sapienza, Rome.
Generative AI, Multimodal Learning, Generative Semantic Communication
🤔How?
1⃣Extract embeddings with modality encoders
2⃣Arrange them in a tensor
3⃣Compute the Gram matrix
4⃣Compute the determinant, and here it is the volume of the parallelotope!

3/n🧵
December 18, 2024 at 3:43 PM
💡The intuition is: semantically aligned data has a small volume, while semantically misaligned data has a large volume!
We do not need to get the pairwise cosine similarity anymore, which is insufficient for tasks that require cross-modal understanding beyond pairs!

2/n🧵
December 18, 2024 at 3:43 PM
No more pairwise cosine similarity for multimodal learning we can directly go up to n modalities!🔴

👉GRAM can align *from 2 to n modalities* altogether in a joint fashion by gaining alignment insights from the volume of the parallelotope spanned by the modality vectors.

1/n🧵
December 18, 2024 at 3:43 PM