https://monicamunnangi.github.io/
📂 Code & Data: github.com/som-shahlab/...
📄 Paper: arxiv.org/abs/2412.124...
📂 Code & Data: github.com/som-shahlab/...
📄 Paper: arxiv.org/abs/2412.124...
FactEHR highlights these gaps and guides improvement.
FactEHR highlights these gaps and guides improvement.
📄 2,168 notes | 🏥 4 note types, 3 health systems
🔗 987K entailment pairs + 3.4K expert labels
🤖 Full fact decompositions from GPT-4o, Gemini 1.5, LLaMA3 8B, and o1-mini
📄 2,168 notes | 🏥 4 note types, 3 health systems
🔗 987K entailment pairs + 3.4K expert labels
🤖 Full fact decompositions from GPT-4o, Gemini 1.5, LLaMA3 8B, and o1-mini
Clinical notes are long, messy, and inconsistent. Evaluating fine-grained factuality across diverse note types (e.g., discharge vs. radiology) is a major challenge — but essential for safe, trustworthy LLMs. ⚠️
Clinical notes are long, messy, and inconsistent. Evaluating fine-grained factuality across diverse note types (e.g., discharge vs. radiology) is a major challenge — but essential for safe, trustworthy LLMs. ⚠️