Andrew Wang
andrewwnlp.bsky.social
Andrew Wang
@andrewwnlp.bsky.social
PhD student @jhuclsp.bsky.social
Tools break in the real world all the time, but not much attention has been given to how well LLMs deal with tool failures.

We introduce HOHW, a tool-use benchmark where problems remain solvable even when tools break adversarially.
September 19, 2025 at 2:04 PM