Generative AI, Multimodal Learning, Generative Semantic Communication
1⃣Extract embeddings with modality encoders
2⃣Arrange them in a tensor
3⃣Compute the Gram matrix
4⃣Compute the determinant, and here it is the volume of the parallelotope!
3/n🧵
1⃣Extract embeddings with modality encoders
2⃣Arrange them in a tensor
3⃣Compute the Gram matrix
4⃣Compute the determinant, and here it is the volume of the parallelotope!
3/n🧵
We do not need to get the pairwise cosine similarity anymore, which is insufficient for tasks that require cross-modal understanding beyond pairs!
2/n🧵
We do not need to get the pairwise cosine similarity anymore, which is insufficient for tasks that require cross-modal understanding beyond pairs!
2/n🧵
👉GRAM can align *from 2 to n modalities* altogether in a joint fashion by gaining alignment insights from the volume of the parallelotope spanned by the modality vectors.
1/n🧵
👉GRAM can align *from 2 to n modalities* altogether in a joint fashion by gaining alignment insights from the volume of the parallelotope spanned by the modality vectors.
1/n🧵