Lightnews — Scholar-powered news

Francesco Ortu

@francescortu.bsky.social

NLP & Interpretability | PhD Student @ University of Trieste & Laboratory of Data Engineering of Area Science Park | Prev MPI-IS

Posts Replies Media Videos

Francesco Ortu

@francescortu.bsky.social

Additionally, blocking communication from this token significantly disrupts performance on standard benchmarks, while blocking image-text communication does not

December 10, 2024 at 8:11 PM

Francesco Ortu

@francescortu.bsky.social

🎯 Key finding: In these models the hidden representations of images and text form disjoint clusters and the communication between modalities is mediated by the special token <end-of-image>!

December 10, 2024 at 8:11 PM

Francesco Ortu

@francescortu.bsky.social

🚨 🚨 Excited to share our latest paper, now on #arXiv!

🖼️ We studied how unified VLMs, trained to generate both text and images (e.g., Meta's Chameleon), exchange information between modalities, comparing them to standard VLMs.

📄 Paper: arxiv.org/abs/2412.06646

Deep dive: 👇

December 10, 2024 at 8:11 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news