Andrew Wang's Avatar

Andrew Wang

@andrewwnlp

PhD student @jhuclsp.bsky.social

368
Followers
40
Following
4
Posts
18.11.2024
Joined
Posts Following

Latest posts by Andrew Wang @andrewwnlp

Thanks to my collaborators Sophia Hager, Adi Asija, Nick Andrews, and @danielkhashabi.bsky.social at @jhuclsp.bsky.social !

Arxiv: arxiv.org/abs/2508.11027
Code: github.com/JHU-CLSP/hell-or-high-water
(Data coming soon!)

19.09.2025 14:06 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

More tools = worse at handling tool failures

When tool schemas are provided in-context, we find that performance gaps between adversarial and non-adversarial settings increases with the number of schemas.

19.09.2025 14:05 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

LLM agents do not handle tool failures well

With RAG on tool schemas, we observe a substantial performance gap between adversarial and non-adversarial settings.

19.09.2025 14:04 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Tools break in the real world all the time, but not much attention has been given to how well LLMs deal with tool failures.

We introduce HOHW, a tool-use benchmark where problems remain solvable even when tools break adversarially.

19.09.2025 14:04 ๐Ÿ‘ 2 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0