Andrew Wang
andrewwnlp.bsky.social
Andrew Wang
@andrewwnlp.bsky.social
PhD student @jhuclsp.bsky.social
Thanks to my collaborators Sophia Hager, Adi Asija, Nick Andrews, and @danielkhashabi.bsky.social at @jhuclsp.bsky.social !

Arxiv: arxiv.org/abs/2508.11027
Code: github.com/JHU-CLSP/hell-or-high-water
(Data coming soon!)
September 19, 2025 at 2:06 PM
More tools = worse at handling tool failures

When tool schemas are provided in-context, we find that performance gaps between adversarial and non-adversarial settings increases with the number of schemas.
September 19, 2025 at 2:05 PM
LLM agents do not handle tool failures well

With RAG on tool schemas, we observe a substantial performance gap between adversarial and non-adversarial settings.
September 19, 2025 at 2:04 PM