Always curious to learn!
💡 TL;DR: VLM-judges can fail at data comparison!
✅ PairBench helps you pick the right one by testing alignment, symmetry, smoothness & controllability—ensuring reliable auto-evaluation.
📄 Paper: arxiv.org/abs/2502.15210
🧵 Thread: 👇
💡 TL;DR: VLM-judges can fail at data comparison!
✅ PairBench helps you pick the right one by testing alignment, symmetry, smoothness & controllability—ensuring reliable auto-evaluation.
📄 Paper: arxiv.org/abs/2502.15210
🧵 Thread: 👇