🖼️ We studied how unified VLMs, trained to generate both text and images (e.g., Meta's Chameleon), exchange information between modalities, comparing them to standard VLMs.
📄 Paper: arxiv.org/abs/2412.06646
Deep dive: 👇
🖼️ We studied how unified VLMs, trained to generate both text and images (e.g., Meta's Chameleon), exchange information between modalities, comparing them to standard VLMs.
📄 Paper: arxiv.org/abs/2412.06646
Deep dive: 👇