Lightnews — Scholar-powered news

Shang Qu

@lindsayttsq.bsky.social

25 followers 250 following 8 posts

AI4Biomed & LLMs @ Tsinghua University

Posts Replies Media Videos

Shang Qu

@lindsayttsq.bsky.social

Check out the details!
📒Preprint: arxiv.org/pdf/2501.18362
🗃️Data files will be released shortly at: github.com/TsinghuaC3I/...

arxiv.org

February 4, 2025 at 1:33 PM

Shang Qu

@lindsayttsq.bsky.social

We also found that reasoning process errors & perceptual errors (in MM) take up a large percentage of model errors. Error cases provide further insights into the challenges models still face regarding clinical reasoning:

February 4, 2025 at 1:33 PM

Shang Qu

@lindsayttsq.bsky.social

💡Clinical reasoning facilitates model reasoning evaluation beyond math & code. We annotate MedXpertQA questions as Reasoning/Understanding based on required reasoning complexity.
Comparing 3 inference-time scaled models against their backbones, we find distinct improvements in the Reasoning subset:

February 4, 2025 at 1:32 PM

Shang Qu

@lindsayttsq.bsky.social

Benchmark construction process - 38k original ➡️ 4k+ final questions
- Filtering for difficulty and diversity using responses from humans + 8 AI experts
- Question rewriting & option set expansion to lower data leakage risk
- Human expert proofreading & error correction

February 4, 2025 at 1:31 PM

Shang Qu

@lindsayttsq.bsky.social

We improve clinical relevance through
⭐️Medical specialty coverage: MedXpertQA includes questions from 20+ exams of medical licensing level or higher
⭐️Realistic context: MM is the first multimodal medical benchmark to introduce rich clinical information with diverse image types

February 4, 2025 at 1:31 PM

Shang Qu

@lindsayttsq.bsky.social

Compared with rapidly saturating benchmarks like MedQA, we raise the bar with harder questions and a sharper focus on medical reasoning.
Full results evaluating 17 LLMs, LMMs, and inference-time scaled models:

February 4, 2025 at 1:30 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news