Conservatoire National des Arts et Métiers - Paris
@YKarmim on Twitter
We propose a simple yet effective post-hoc model that learns to capture ambiguity from both images and captions, achieving high accuracy in detecting VLM failures.
We propose a simple yet effective post-hoc model that learns to capture ambiguity from both images and captions, achieving high accuracy in detecting VLM failures.