Hailey Joren
haileyjoren.bsky.social
Hailey Joren
@haileyjoren.bsky.social
PhD Student @ UC San Diego

Researching reliable, interpretable, and human-aligned ML/AI
Building on these insights, we developed a selective generation framework using both sufficient context signals and model confidence to decide when to respond vs. abstain—improving accuracy of responses by 2-10% for Gemini, GPT, and Gemma.
April 24, 2025 at 6:18 PM
Intriguingly, models sometimes generate correct answers despite insufficient context. We taxonomize these cases: parametric knowledge bridging information gaps, yes/no questions with 50% chance of correctness, and instances where the context provides partial reasoning paths.
April 24, 2025 at 6:18 PM
We analyzed standard QA datasets through our sufficient context lens and found a surprising percentage lack sufficient information: ~56% for Musique, ~56% for HotpotQA, and ~23% for FreshQA. This highlights the magnitude of the information retrieval challenge.
April 24, 2025 at 6:18 PM
A major finding: When context is sufficient, larger models (Gemini 1.5 Pro, GPT-4o, Claude 3.5) excel. But when it's insufficient, they're more likely to hallucinate than abstain—presenting incorrect answers with high confidence.
April 24, 2025 at 6:18 PM