🚴🏔️🍄❄️⛷️🧶⚫️⚪️📚🍸in Seattle; llwang.net; she/her
seattleawis.org/event/seattl...
seattleawis.org/event/seattl...
We assess model ability to abstain w/ insufficient or incorrect context. Counter-intuitively, adding irrelevant context can sometimes increase task performance!
We assess model ability to abstain w/ insufficient or incorrect context. Counter-intuitively, adding irrelevant context can sometimes increase task performance!
We introduce a granular meta-evaluation testbed for medical PLS, and evaluate 14 metrics including automated scores, lexical features, and LLM prompting
We introduce a granular meta-evaluation testbed for medical PLS, and evaluate 14 metrics including automated scores, lexical features, and LLM prompting