Transluce's Avatar

Transluce

@transluce

Open and scalable technology for understanding AI systems. transluce.org

34
Followers
2
Following
7
Posts
12.11.2024
Joined
Posts Following

Latest posts by Transluce @transluce

Preview
Quickstart - Docent Get started ingesting agent runs into Docent

Use Docent to analyze your own traces: docs.transluce.org/quickstart
Read our Blog: transluce.org/docent/blog/...

19.02.2026 01:35 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Diagnosing a performance regression on Terminal-Bench with Docent Why does GPT-5.1 Codex underperform GPT-5 Codex?

You can replicate our full analysis with 5 min of setup. Clone our Terminal-Bench data & follow along: transluce.org/docent/blog/...

19.02.2026 01:35 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Lower benchmark numbers don’t always mean worse models. Docent exposes what actually drives bottom-line numbers: broken environments, reward hacking, or in this case, a constraint the agent isn’t aware of.

19.02.2026 01:35 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

GPT-5.1 Codex starts more long-running jobs like training and password cracking that result in timeouts. But Terminal-Bench's system prompt *never mentions the timeout constraint*. GPT-5.1 Codex may be choosing viable long-horizon strategies!

19.02.2026 01:35 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Docent is our tool for debugging agents by analyzing traces at scale. Docent (1) compared each failed run to a successful run on the same task by a different model, (2) synthesized the failures by model, (3) quantified the timeout rates.

19.02.2026 01:35 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Why does GPT-5.1 Codex score 6.5% worse than GPT-5 Codex on Terminal-Bench, with the same scaffold? 🧡

GPT-5.1 times out at ~2x the rate of GPT-5. Excluding timeouts, GPT-5.1 wins by 7.2%. We analyzed 256M+ tokens of traces and found this in under an hour. Here’s how πŸ‘‡

19.02.2026 01:35 πŸ‘ 5 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0

Hello world! Transluce is excited to begin crossposting on bluesky. You can learn more about our work at transluce.org, and read a letter from co-founders Jacob Steinhardt and Sarah Schwettmann here: transluce.org/introducing-...

19.02.2026 01:32 πŸ‘ 5 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0