Generative AI, Multimodal Learning, Generative Semantic Communication
(first author), @luigi_sigillo and @dacom.bsky.social from ISPAMM Lab
(first author), @luigi_sigillo and @dacom.bsky.social from ISPAMM Lab
1⃣GRAM aligns all the modalities altogether, mathematically proving the alignment of multiple modalities.
2⃣GRAM is proven to work from 2 up to n modalities!
3⃣GRAM established a new SOTA in downstream tasks!
4⃣No need to scale up model parameters!
4/n🧵
1⃣GRAM aligns all the modalities altogether, mathematically proving the alignment of multiple modalities.
2⃣GRAM is proven to work from 2 up to n modalities!
3⃣GRAM established a new SOTA in downstream tasks!
4⃣No need to scale up model parameters!
4/n🧵
1⃣Extract embeddings with modality encoders
2⃣Arrange them in a tensor
3⃣Compute the Gram matrix
4⃣Compute the determinant, and here it is the volume of the parallelotope!
3/n🧵
1⃣Extract embeddings with modality encoders
2⃣Arrange them in a tensor
3⃣Compute the Gram matrix
4⃣Compute the determinant, and here it is the volume of the parallelotope!
3/n🧵
We do not need to get the pairwise cosine similarity anymore, which is insufficient for tasks that require cross-modal understanding beyond pairs!
2/n🧵
We do not need to get the pairwise cosine similarity anymore, which is insufficient for tasks that require cross-modal understanding beyond pairs!
2/n🧵