Lightnews — Scholar-powered news

Andrew Wang

@andrewwnlp.bsky.social

370 followers 40 following 4 posts

PhD student @jhuclsp.bsky.social

Posts Replies Media Videos

Andrew Wang

@andrewwnlp.bsky.social

More tools = worse at handling tool failures

When tool schemas are provided in-context, we find that performance gaps between adversarial and non-adversarial settings increases with the number of schemas.

September 19, 2025 at 2:05 PM

Andrew Wang

@andrewwnlp.bsky.social

LLM agents do not handle tool failures well

With RAG on tool schemas, we observe a substantial performance gap between adversarial and non-adversarial settings.

September 19, 2025 at 2:04 PM

Andrew Wang

@andrewwnlp.bsky.social

Tools break in the real world all the time, but not much attention has been given to how well LLMs deal with tool failures.

We introduce HOHW, a tool-use benchmark where problems remain solvable even when tools break adversarially.

September 19, 2025 at 2:04 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news