Lightnews — Scholar-powered news

@kunxiang.bsky.social

8 followers 3 following 3 posts

Posts Replies Media Videos

kunxiang.bsky.social

@kunxiang.bsky.social

📊Experiments reveal that even SOTA models like Gemini-2.5-Pro and o4-mini achieve accuracy rates below 55%, with over 30% error rates on simple middle-school-level problems, highlighting significant challenges in multimodal reasoning.

May 28, 2025 at 6:44 AM

kunxiang.bsky.social

@kunxiang.bsky.social

🖼️Covering 2,000 vision-text multimodal physics problems spanning from middle school to doctoral qualification exams, the SeePhys benchmark systematically evaluates LLMs/MLLMs on tasks integrating complex scientific diagrams with theoretical derivations.

May 28, 2025 at 6:42 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news