Wenxuan Ding (@wenxuand)

Check out the paper for more details:
📄Paper: arxiv.org/abs/2602.16699
🔗Code: github.com/Wenwen-D/env...

Many thanks to the wonderful co-authors: @nickatomlin.bsky.social @gregdnlp.bsky.social

23.02.2026 16:00 👍 1 🔁 0 💬 0 📌 0

We also experiment with a coding setting. In this setting, we additionally compare the model to end-to-end RL.

💡 Result: conditioning on estimated priors reinforces adaptive reasoning and induces more optimal behavior.

23.02.2026 16:00 👍 0 🔁 0 💬 1 📌 0

In an evaluation on PopQA, CTA’s retrieval decisions form a clear boundary with respect to confidence and retrieval cost, closely matching the theoretical optimal policy (indicated by the background shading).

23.02.2026 16:00 👍 0 🔁 0 💬 1 📌 0

For example, in QA, calibrated uncertainty estimates induce more principled retrieval decisions.

Given information about whether direct answering or retrieval would succeed, an oracle reasoner can weigh the tradeoffs and retrieve only when the expected benefit exceeds the retrieval cost.

23.02.2026 16:00 👍 0 🔁 0 💬 1 📌 0

Calibrate-Then-Act induces an LLM to reason about these tradeoffs. It presents information about the environment to a model explicitly in its prompt, which allows for better decision-making. This model can be tuned with RL for further improvement.

23.02.2026 16:00 👍 0 🔁 0 💬 1 📌 0

Existing approaches to calibrate these decisions involve prompt engineering and end-to-end RL.

Cost may or may not be incorporated into the reward, and even when it is, it is unclear whether the agent effectively represents the cost–uncertainty tradeoff.

23.02.2026 16:00 👍 0 🔁 0 💬 1 📌 0

LLM agents vary in how long they interact with the environment before committing to a final solution.

In a coding setting, they may write and run tests during their operation.

In a retrieval setting, they may decide to find more information dynamically, as in Self-RAG.

23.02.2026 16:00 👍 2 🔁 0 💬 1 📌 0

Agents interact with environments to get information. But exploration (tools, retrieval, user interaction) is costly.

Calibrate-Then-Act allows LLM agents to balance exploration and cost:
📐 Estimate uncertainty about the environment
💭 Reason about cost-uncertainty tradeoffs
⚙️ Act accordingly

23.02.2026 16:00 👍 17 🔁 6 💬 1 📌 1

Wenxuan Ding

Latest posts by Wenxuan Ding @wenxuand