Francesco Ortu
banner
francescortu.bsky.social
Francesco Ortu
@francescortu.bsky.social
NLP & Interpretability | PhD Student @ University of Trieste & Laboratory of Data Engineering of Area Science Park | Prev MPI-IS
Additionally, blocking communication from this token significantly disrupts performance on standard benchmarks, while blocking image-text communication does not
December 10, 2024 at 8:11 PM
🎯 Key finding: In these models the hidden representations of images and text form disjoint clusters and the communication between modalities is mediated by the special token <end-of-image>!
December 10, 2024 at 8:11 PM
🚨 🚨 Excited to share our latest paper, now on #arXiv!

🖼️ We studied how unified VLMs, trained to generate both text and images (e.g., Meta's Chameleon), exchange information between modalities, comparing them to standard VLMs.

📄 Paper: arxiv.org/abs/2412.06646

Deep dive: 👇
December 10, 2024 at 8:11 PM