Wenxuan Ding's Avatar

Wenxuan Ding

@wenxuand

CS PhD student at NYU https://wenwen-d.github.io/

17
Followers
9
Following
8
Posts
21.03.2025
Joined
Posts Following

Latest posts by Wenxuan Ding @wenxuand

Check out the paper for more details:
πŸ“„Paper: arxiv.org/abs/2602.16699
πŸ”—Code: github.com/Wenwen-D/env...

Many thanks to the wonderful co-authors: @nickatomlin.bsky.social @gregdnlp.bsky.social

23.02.2026 16:00 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

We also experiment with a coding setting. In this setting, we additionally compare the model to end-to-end RL.

πŸ’‘ Result: conditioning on estimated priors reinforces adaptive reasoning and induces more optimal behavior.

23.02.2026 16:00 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

In an evaluation on PopQA, CTA’s retrieval decisions form a clear boundary with respect to confidence and retrieval cost, closely matching the theoretical optimal policy (indicated by the background shading).

23.02.2026 16:00 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

For example, in QA, calibrated uncertainty estimates induce more principled retrieval decisions.

Given information about whether direct answering or retrieval would succeed, an oracle reasoner can weigh the tradeoffs and retrieve only when the expected benefit exceeds the retrieval cost.

23.02.2026 16:00 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Calibrate-Then-Act induces an LLM to reason about these tradeoffs. It presents information about the environment to a model explicitly in its prompt, which allows for better decision-making. This model can be tuned with RL for further improvement.

23.02.2026 16:00 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Existing approaches to calibrate these decisions involve prompt engineering and end-to-end RL.

Cost may or may not be incorporated into the reward, and even when it is, it is unclear whether the agent effectively represents the cost–uncertainty tradeoff.

23.02.2026 16:00 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

LLM agents vary in how long they interact with the environment before committing to a final solution.

In a coding setting, they may write and run tests during their operation.

In a retrieval setting, they may decide to find more information dynamically, as in Self-RAG.

23.02.2026 16:00 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Agents interact with environments to get information. But exploration (tools, retrieval, user interaction) is costly.

Calibrate-Then-Act allows LLM agents to balance exploration and cost:
πŸ“ Estimate uncertainty about the environment
πŸ’­ Reason about cost-uncertainty tradeoffs
βš™οΈ Act accordingly

23.02.2026 16:00 πŸ‘ 17 πŸ” 6 πŸ’¬ 1 πŸ“Œ 1