asaakyan.github.io
1. VLMs struggle to generalize from literal to figurative meaning understanding (training on e-ViL only achieves random F1 on our task)
2. Figurative meaning in the image is harder to explain compared to when it is in the text
3. VLMs benefit from image data during fine-tuning
1. VLMs struggle to generalize from literal to figurative meaning understanding (training on e-ViL only achieves random F1 on our task)
2. Figurative meaning in the image is harder to explain compared to when it is in the text
3. VLMs benefit from image data during fine-tuning
New task & dataset of images and captions with figurative phenomena like metaphor, idiom, sarcasm, and humor.
New task & dataset of images and captions with figurative phenomena like metaphor, idiom, sarcasm, and humor.