Joint work with @Ander Salaberria @eagirre.bsky.social @gazkune.bsky.social @hitz-zentroa.bsky.social
Joint work with @Ander Salaberria @eagirre.bsky.social @gazkune.bsky.social @hitz-zentroa.bsky.social
1. There is room to improve the quality of extracted text segments.
2. Our method achieves significant performance gains in Winoground's non-trivial instances.
3. Isolated image crops can lose size and quantity information, leaving room for improvement.
1. There is room to improve the quality of extracted text segments.
2. Our method achieves significant performance gains in Winoground's non-trivial instances.
3. Isolated image crops can lose size and quantity information, leaving room for improvement.
1. Divide the image into smaller crops.
2. Extract text segments capturing objects, attributes and relations.
3. Use the VLM to find image crops that best fit the text segments.
4. Aggregate matching similarities for the final score.
1. Divide the image into smaller crops.
2. Extract text segments capturing objects, attributes and relations.
3. Use the VLM to find image crops that best fit the text segments.
4. Aggregate matching similarities for the final score.
Thank you!
Thank you!
Continue pretraining base models is intuitive, but what about instructed models? We analyze systematically all different approaches to find the best solution.
2/3
Continue pretraining base models is intuitive, but what about instructed models? We analyze systematically all different approaches to find the best solution.
2/3
ebaluatoia.hitz.eus
Sartu eta parte hartu, erraza eta dibertigarria izateaz gain, sariak ere badaude!
ebaluatoia.hitz.eus
Sartu eta parte hartu, erraza eta dibertigarria izateaz gain, sariak ere badaude!