momergul.bsky.social
@momergul.bsky.social
CS PhD Student @Cornell
Tons of other insights in the paper. We show that the strength of the helper / search tool is a key consideration. Replacing our retriever with an oracle results in all models converging to always seeking help. The noisiness of the retriever is a feature not a bug!
October 2, 2025 at 7:40 PM
Baseline RL implementations often converge to sub-optimal policies that always or never search. MASH uses a lightweight warm start data generation & SFT pipeline that induces better search behaviors. MASH models can discover a mix of 0/1/2 searches as needed while baselines fail.
October 2, 2025 at 7:40 PM
For (ii), MASH shows strong abstention behavior off-the-shelf! Its performance is analogous to abstention baselines that require pre-determining knowledge boundaries and model-specific training data. It beats SFT approaches and is competitive with DPO!
October 2, 2025 at 7:40 PM
We evaluate MASH under 2 settings: (i) w/ access to search, (ii) w/o search as an abstention model.

For (i), MASH outperforms efficient search baselines, esp. for multi-hop datasets (7.6% accuracy boost), even matching search baselines w/o any search penalties!
October 2, 2025 at 7:40 PM
🚨Modeling Abstention via Selective Help-seeking

LLMs learn to use search tools to answer questions they would otherwise hallucinate on. But can this also teach them what they know vs not?

We introduce MASH that trains LLMs for search and gets abstentions for free!
October 2, 2025 at 7:40 PM