Shan Chen
@shan23chen.bsky.social
PhDing @AIM_Harvard @MassGenBrigham|PhD Fellow @Google | Previously @Bos_CHIP @BrandeisU
More robustness and explainabilities 🧐 for Health AI.
shanchen.dev
More robustness and explainabilities 🧐 for Health AI.
shanchen.dev
Pinned
Shan Chen
@shan23chen.bsky.social
· Nov 13
What We Learned About LLM/VLMs in Healthcare AI Evaluation:
A Blog post by Shan Chen on Hugging Face
huggingface.co
Here are some reflections on many studies we did this year. Tons of progress has been made, but there are still safety concerns..🧐
Poster 10:30 riverfront at EMNLP2024 🏖️
Happy to chat and connect!
📃 huggingface.co/blog/shanche...
🔊 tinyurl.com/aimpodcast24
@daniellebitterman.bsky.social
Poster 10:30 riverfront at EMNLP2024 🏖️
Happy to chat and connect!
📃 huggingface.co/blog/shanche...
🔊 tinyurl.com/aimpodcast24
@daniellebitterman.bsky.social
Reposted by Shan Chen
LLMs tend to prioritize helpfulness > reason. We show that safety-aware, compute-efficient fine-tuning helps models reason more critically in healthcare domain, and generalizes to improved safety alignment across other domains.
www.nature.com/articles/s41... @shan23chen.bsky.social
www.nature.com/articles/s41... @shan23chen.bsky.social
When helpfulness backfires: LLMs and the risk of false medical information due to sycophantic behavior - npj Digital Medicine
npj Digital Medicine - When helpfulness backfires: LLMs and the risk of false medical information due to sycophantic behavior
www.nature.com
October 18, 2025 at 2:18 PM
LLMs tend to prioritize helpfulness > reason. We show that safety-aware, compute-efficient fine-tuning helps models reason more critically in healthcare domain, and generalizes to improved safety alignment across other domains.
www.nature.com/articles/s41... @shan23chen.bsky.social
www.nature.com/articles/s41... @shan23chen.bsky.social
Reposted by Shan Chen
An overemphasis on helpfulness makes LLMs vulnerable.
Research shows models will comply with illogical medical requests, generating false information. This sycophantic tendency can be corrected with specific prompting and fine-tuning. #MedSky #MedAI #MLSky
Research shows models will comply with illogical medical requests, generating false information. This sycophantic tendency can be corrected with specific prompting and fine-tuning. #MedSky #MedAI #MLSky
When helpfulness backfires: LLMs and the risk of false medical information due to sycophantic behavior - npj Digital Medicine
npj Digital Medicine - When helpfulness backfires: LLMs and the risk of false medical information due to sycophantic behavior
www.nature.com
October 17, 2025 at 3:53 PM
Reposted by Shan Chen
[1/]💡New Paper
Large reasoning models (LRMs) are strong in English — but how well do they reason in your language?
Our latest work uncovers their limitation and a clear trade-off:
Controlling Thinking Trace Language Comes at the Cost of Accuracy
📄Link: arxiv.org/abs/2505.22888
Large reasoning models (LRMs) are strong in English — but how well do they reason in your language?
Our latest work uncovers their limitation and a clear trade-off:
Controlling Thinking Trace Language Comes at the Cost of Accuracy
📄Link: arxiv.org/abs/2505.22888
May 30, 2025 at 1:09 PM
[1/]💡New Paper
Large reasoning models (LRMs) are strong in English — but how well do they reason in your language?
Our latest work uncovers their limitation and a clear trade-off:
Controlling Thinking Trace Language Comes at the Cost of Accuracy
📄Link: arxiv.org/abs/2505.22888
Large reasoning models (LRMs) are strong in English — but how well do they reason in your language?
Our latest work uncovers their limitation and a clear trade-off:
Controlling Thinking Trace Language Comes at the Cost of Accuracy
📄Link: arxiv.org/abs/2505.22888
Reposted by Shan Chen
Agents are all the rage and we need to track their abilities in the medical domain. Enter MedBrowseComp, the 1st benchmark to assess agents' abilities to reason, navigate the web, and search for verifiable med info!
Preprint: arxiv.org/abs/2505.14963
Site: moreirap12.github.io/mbc-browse-a...
Preprint: arxiv.org/abs/2505.14963
Site: moreirap12.github.io/mbc-browse-a...
May 22, 2025 at 4:27 PM
Agents are all the rage and we need to track their abilities in the medical domain. Enter MedBrowseComp, the 1st benchmark to assess agents' abilities to reason, navigate the web, and search for verifiable med info!
Preprint: arxiv.org/abs/2505.14963
Site: moreirap12.github.io/mbc-browse-a...
Preprint: arxiv.org/abs/2505.14963
Site: moreirap12.github.io/mbc-browse-a...
Reposted by Shan Chen
✨ What if your face could tell something about how old your body really is?
Excited to share our latest paper just published in The Lancet Digital Health (open access!)
👉 www.thelancet.com/journals/lan...
Excited to share our latest paper just published in The Lancet Digital Health (open access!)
👉 www.thelancet.com/journals/lan...
FaceAge, a deep learning system to estimate biological age from face photographs to improve prognostication: a model development and validation study
Our results suggest that a deep learning model can estimate biological age from face
photographs and thereby enhance survival prediction in patients with cancer. Further
research, including validation...
www.thelancet.com
May 9, 2025 at 3:06 PM
✨ What if your face could tell something about how old your body really is?
Excited to share our latest paper just published in The Lancet Digital Health (open access!)
👉 www.thelancet.com/journals/lan...
Excited to share our latest paper just published in The Lancet Digital Health (open access!)
👉 www.thelancet.com/journals/lan...
CALL FOR REMOTE SPEAKERS: Science in the News Seminar Series, hosted by Harvard x Beacon Hill Seminars
scientists, engineers & doctors, from academic researchers to industry professionals! 🧑🔬🧑💻
Email the organizers at scienceinthenews.bhs@gmail.com to sign up for a date! (First-come-first-served)
scientists, engineers & doctors, from academic researchers to industry professionals! 🧑🔬🧑💻
Email the organizers at scienceinthenews.bhs@gmail.com to sign up for a date! (First-come-first-served)
March 7, 2025 at 1:45 AM
CALL FOR REMOTE SPEAKERS: Science in the News Seminar Series, hosted by Harvard x Beacon Hill Seminars
scientists, engineers & doctors, from academic researchers to industry professionals! 🧑🔬🧑💻
Email the organizers at scienceinthenews.bhs@gmail.com to sign up for a date! (First-come-first-served)
scientists, engineers & doctors, from academic researchers to industry professionals! 🧑🔬🧑💻
Email the organizers at scienceinthenews.bhs@gmail.com to sign up for a date! (First-come-first-served)
Reposted by Shan Chen
We have a NEW PAPER in @naturemedicine.bsky.social on reporting recommendations for addressing the unique challenges of #largelanguagemodels (LLMs) in biomedical applications
www.nature.com/articles/s41...
#MLSky #StatsSky #medSky #AISky #artificialintelligence #generativeAI #transparency
www.nature.com/articles/s41...
#MLSky #StatsSky #medSky #AISky #artificialintelligence #generativeAI #transparency
January 8, 2025 at 10:24 AM
We have a NEW PAPER in @naturemedicine.bsky.social on reporting recommendations for addressing the unique challenges of #largelanguagemodels (LLMs) in biomedical applications
www.nature.com/articles/s41...
#MLSky #StatsSky #medSky #AISky #artificialintelligence #generativeAI #transparency
www.nature.com/articles/s41...
#MLSky #StatsSky #medSky #AISky #artificialintelligence #generativeAI #transparency
Reposted by Shan Chen
I am always worrying about Benzene (my cat)! www.nytimes.com/2024/12/05/w...
But please don't stop wearing sunscreen! Sun exposure is a known cancer risk, benzene risks unknown. This article has good tips if you want to minimize benzene exposure.
Obligatory Benzene (cat) pic ⬇️
But please don't stop wearing sunscreen! Sun exposure is a known cancer risk, benzene risks unknown. This article has good tips if you want to minimize benzene exposure.
Obligatory Benzene (cat) pic ⬇️
Is It Time to Worry About Benzene in Personal Care Products?
The carcinogen has been found in sunscreen, deodorants, acne creams and other personal care products. Here’s what to know.
www.nytimes.com
December 6, 2024 at 11:12 PM
I am always worrying about Benzene (my cat)! www.nytimes.com/2024/12/05/w...
But please don't stop wearing sunscreen! Sun exposure is a known cancer risk, benzene risks unknown. This article has good tips if you want to minimize benzene exposure.
Obligatory Benzene (cat) pic ⬇️
But please don't stop wearing sunscreen! Sun exposure is a known cancer risk, benzene risks unknown. This article has good tips if you want to minimize benzene exposure.
Obligatory Benzene (cat) pic ⬇️
Team @AnthropicAI & @thesubhashk @joshengels.bsky.social shows SAE features can be good for classifications.
Good evidence by @arthurconmy.bsky.social & @neelnanda.bsky.social on SAE features are transferable across base and IT models.
🧐 How about LLaVA?
tiny.cc/sae1
Good evidence by @arthurconmy.bsky.social & @neelnanda.bsky.social on SAE features are transferable across base and IT models.
🧐 How about LLaVA?
tiny.cc/sae1
Are SAE features from the Base Model still meaningful to LLaVA? — LessWrong
Shan Chen, Jack Gallifant, Kuleen Sasse, Danielle Bitterman[1]
Please read this as a work in progress where we are colleagues sharing this in a lab (…
tiny.cc
December 5, 2024 at 8:16 PM
Team @AnthropicAI & @thesubhashk @joshengels.bsky.social shows SAE features can be good for classifications.
Good evidence by @arthurconmy.bsky.social & @neelnanda.bsky.social on SAE features are transferable across base and IT models.
🧐 How about LLaVA?
tiny.cc/sae1
Good evidence by @arthurconmy.bsky.social & @neelnanda.bsky.social on SAE features are transferable across base and IT models.
🧐 How about LLaVA?
tiny.cc/sae1
Crosscare is accepted
@neuripsconf.bsky.social
🎉 We showed LLMs are far from grounded with true prevalence, and groundings across languages are so inconsistent!
Also, a dashboard for people to explore the prevalence data across diseases and racial groups: crosscare.net
#NeurIPS2024
@neuripsconf.bsky.social
🎉 We showed LLMs are far from grounded with true prevalence, and groundings across languages are so inconsistent!
Also, a dashboard for people to explore the prevalence data across diseases and racial groups: crosscare.net
#NeurIPS2024
Cross-Care Dataset
The Cross-Care Dataset provides comprehensive insights into co-occurrence patterns of various diseases. This dataset is invaluable for researchers and healthcare professionals seeking to understand co...
crosscare.net
November 27, 2024 at 3:09 PM
Crosscare is accepted
@neuripsconf.bsky.social
🎉 We showed LLMs are far from grounded with true prevalence, and groundings across languages are so inconsistent!
Also, a dashboard for people to explore the prevalence data across diseases and racial groups: crosscare.net
#NeurIPS2024
@neuripsconf.bsky.social
🎉 We showed LLMs are far from grounded with true prevalence, and groundings across languages are so inconsistent!
Also, a dashboard for people to explore the prevalence data across diseases and racial groups: crosscare.net
#NeurIPS2024
Reposted by Shan Chen
My department is hiring: apply to be my colleague!
www.chip.org/employment/i...
www.chip.org/employment/i...
Instructor, Assistant, or Associate Professor Position in Computational Health Informatics | Chip
Join the forefront of healthcare innovation at Harvard and Boston Children’s Hospital, where informatics, computation, and artificial intelligence (AI) are transforming care delivery and biomedical sc...
www.chip.org
November 25, 2024 at 7:55 PM
My department is hiring: apply to be my colleague!
www.chip.org/employment/i...
www.chip.org/employment/i...
Million thanks to my wonderful advisor @daniellebitterman.bsky.social and all my colleagues and friends!
🎉 Incredibly proud of @shan23chen.bsky.social for being selected for the 2024 Google PhD Fellowship in Natural Language Processing: blog.google/technology/r... !!! So excited to see how Shan's contributions will continue shaping the future of clinical NLP
#HealthAI #NLP
🌟
#HealthAI #NLP
🌟
blog.google
November 17, 2024 at 4:31 PM
Million thanks to my wonderful advisor @daniellebitterman.bsky.social and all my colleagues and friends!
Here are some reflections on many studies we did this year. Tons of progress has been made, but there are still safety concerns..🧐
Poster 10:30 riverfront at EMNLP2024 🏖️
Happy to chat and connect!
📃 huggingface.co/blog/shanche...
🔊 tinyurl.com/aimpodcast24
@daniellebitterman.bsky.social
Poster 10:30 riverfront at EMNLP2024 🏖️
Happy to chat and connect!
📃 huggingface.co/blog/shanche...
🔊 tinyurl.com/aimpodcast24
@daniellebitterman.bsky.social
What We Learned About LLM/VLMs in Healthcare AI Evaluation:
A Blog post by Shan Chen on Hugging Face
huggingface.co
November 13, 2024 at 4:28 AM
Here are some reflections on many studies we did this year. Tons of progress has been made, but there are still safety concerns..🧐
Poster 10:30 riverfront at EMNLP2024 🏖️
Happy to chat and connect!
📃 huggingface.co/blog/shanche...
🔊 tinyurl.com/aimpodcast24
@daniellebitterman.bsky.social
Poster 10:30 riverfront at EMNLP2024 🏖️
Happy to chat and connect!
📃 huggingface.co/blog/shanche...
🔊 tinyurl.com/aimpodcast24
@daniellebitterman.bsky.social