banner
kaijie-mo.bsky.social
@kaijie-mo.bsky.social
NLP & Ling; Phd student @UTAustin @UT_Linguistics

website: https://kaijiemo-kj.github.io/
Results (3/4)
– With evidence, models strongly adhere to it with high confidence, even for toxic or nonsensical interventions
– Implausibility awareness is transient; once evidence appears, models rarely flag problems
– Scaling, medical fine-tuning, and skeptical prompting offer little protection
January 21, 2026 at 6:50 PM
Setup (2/4)
We introduce MedCounterFact, a counterfactual medical QA dataset built on RCT-based evidence synthesis.
– Replace real interventions in evidence with nonce, mismatched medical, non-medical, or toxic terms
– Evaluate 9 frontier LLMs under evidence-grounded prompts
January 21, 2026 at 6:47 PM