🧑💻Code/data: github.com/KaijieMo-kj/...
w/
@kaijie-mo.bsky.social @sidvenkatayogi.bsky.social
@chantalsh.bsky.social @ramezkouzy.bsky.social
@cocoweixu.bsky.social @byron.bsky.social @jessyjli.bsky.social
– With evidence, models strongly adhere to it with high confidence, even for toxic or nonsensical interventions
– Implausibility awareness is transient; once evidence appears, models rarely flag problems
– Scaling, medical fine-tuning, and skeptical prompting offer little protection
– With evidence, models strongly adhere to it with high confidence, even for toxic or nonsensical interventions
– Implausibility awareness is transient; once evidence appears, models rarely flag problems
– Scaling, medical fine-tuning, and skeptical prompting offer little protection
We introduce MedCounterFact, a counterfactual medical QA dataset built on RCT-based evidence synthesis.
– Replace real interventions in evidence with nonce, mismatched medical, non-medical, or toxic terms
– Evaluate 9 frontier LLMs under evidence-grounded prompts
We introduce MedCounterFact, a counterfactual medical QA dataset built on RCT-based evidence synthesis.
– Replace real interventions in evidence with nonce, mismatched medical, non-medical, or toxic terms
– Evaluate 9 frontier LLMs under evidence-grounded prompts