Formalizing a proof in Lean using Claude Code
I revisit the formalization task I did nine months ago in https://www.youtube.com/watch?v=cyyR7j2ChCI , but this time use a recent version of Claude Code to ...
We can also keep results on track by checking for stuck, looping, laziness, etc with monitor agents we call Sentinel.
Disclaimer: Not a mathematician, so if there are ways to improve this or I went egregiously wrong somewhere - let me down easy haha
Original video: www.youtube.com/watch?v=JHE...
11.03.2026 02:12
๐ 0
๐ 0
๐ฌ 0
๐ 0
This is all the hank does - the abstraction also means that we can clear context often and with good precision, switch between harnesses (codex in codex, haiku in claude code) within the same run.
11.03.2026 02:12
๐ 0
๐ 0
๐ฌ 1
๐ 0
Clone and run it for yourself with
bunx hankweave hank.json <input_folder_of_proof_to_formalize>
The way it works is pretty simple: Take an informal proof, clean it up into a template, and slowly chip away at it.
11.03.2026 02:12
๐ 0
๐ 0
๐ฌ 1
๐ 0
Turns out you can formalize his agentic algorithm into a simple hankweave program (couple lines of json), with sentinels and loops to improve reliability. Haiku was able to formalize the same proofs (and harder ones) for less than a dollar.
11.03.2026 02:12
๐ 0
๐ 0
๐ฌ 1
๐ 0
This is the same insight that led us to build hankweave for ourselves six months ago. Since then almost all of our work has been frozen in hanks (repeatable AI programs) - some for design, some for building code, moving data - so why not one for math?
11.03.2026 02:12
๐ 0
๐ 0
๐ฌ 1
๐ 0
My biggest thought when watching the video was that all of us keep rediscovering the same patterns out of necessity: agents work better if you can structure their results into verifiable checkpoints, and then patiently step them through it.
11.03.2026 02:12
๐ 0
๐ 0
๐ฌ 1
๐ 0
GitHub - hrishioa/tao-formalization-hank: Fun AI program for converting informal proofs to formalizations in Lean.
Fun AI program for converting informal proofs to formalizations in Lean. - hrishioa/tao-formalization-hank
I thought I'd take a crack at turning Terence Tao's @teorth.bsky.social method for formalizing informal proofs with Claude Code into an automated hank that runs by itself, on haiku, for less than a dollar per proof - and open-sourcing it: github.com/hrishioa/ta...
11.03.2026 02:12
๐ 0
๐ 0
๐ฌ 1
๐ 0
Hankweave - A runtime for repairable agents
Open-source runtime for maintainable, long-horizon AI agents. Repair instead of rebuild.
You wouldn't vibe a car
You wouldn't vibe a house
You wouldn't vibe the important agents in your life and work
southbridge.ai/hankweave
20.02.2026 18:19
๐ 0
๐ 0
๐ฌ 0
๐ 0
It's a point of pride that we have hanks that are six months old - with all the warts and learnings of doing real tasks, where we've collaborated with external partners to incorporate their knowledge into hanks that can be reused.
17.02.2026 17:12
๐ 0
๐ 0
๐ฌ 1
๐ 0
Hankweave treats agent harnesses (CC, Codex, etc) as the primitive instead of the LLM call. Hanks can swap between claude in agents sdk, codex in codex or opencode or gemini - whatever works best, with a unified portable abstraction that makes hanks reusable and modular.
17.02.2026 17:12
๐ 0
๐ 0
๐ฌ 1
๐ 0
Claude Code becomes the REPL in a way - you experiment and figure out what's possible, then freeze that behavior inside reusable blocks and sequence those blocks into programs that hankweave can execute
Loop those blocks for crazy new patterns
17.02.2026 17:12
๐ 0
๐ 0
๐ฌ 1
๐ 0
Basic principles:
Hankweave runs "hanks" - sequenced AI programs combining prompts, code, rigs, loops, monitor agents called Sentinels, etc
hanks are hard to write but simple and easy to debug. even as they get complex
Think skills but way more powerful and surgical
17.02.2026 17:12
๐ 0
๐ 0
๐ฌ 1
๐ 0
Southbridge AI
Research and writing on AI systems, agentic architectures, and the future of intelligence.
How do we stop throwing away our agents, leave greenfield behind, and keep complex systems maintainable?
Forget skills & mcps, how do we REMOVE things from context?
How do we build reliable agents that can survive being foundational infrastructure?
southbridge.ai/hankweave
17.02.2026 17:12
๐ 0
๐ 0
๐ฌ 1
๐ 0
Labor of love: We're open-sourcing the runtime we use to run long-horizon agents at Southbridge.
Something like this exists at almost every serious AI team I know.
We ended up needing to build it because we couldnโt buy it.
`bunx hankweave` gives you a fun demo.
17.02.2026 17:12
๐ 3
๐ 1
๐ฌ 1
๐ 0
But also thank you @exhaze for voicing the unvoiced
03.01.2026 03:12
๐ 0
๐ 0
๐ฌ 0
๐ 0
I think this is what my friends have been trying to tell me
I'm sorry all
03.01.2026 03:12
๐ 1
๐ 0
๐ฌ 1
๐ 0
Disclaimer: I *know* model X and harness Y have been able to do these things before, but this feels like a genuine upgrade in all my testing - like this is the first time the model can actually see.
YMMV though - happy Thanksgiving!
01.12.2025 06:22
๐ 0
๐ 0
๐ฌ 0
๐ 0
I think we've just scratched the surface on what's possible.
This might be the start of us actually being able to talk to models with images, conveying a lot more than what's been possible before.
01.12.2025 06:22
๐ 0
๐ 0
๐ฌ 1
๐ 0
Opus 4.5 still amazes me, that Anthropic in a single release moved from models that could sort-of understand pictures to something that actually knows what its looking at, and (from my testing) the best model for visual understanding by far.
01.12.2025 06:22
๐ 0
๐ 0
๐ฌ 1
๐ 0
What's amazing is also that these can now be collaborated on, and version controlled. You probably don't need this for a comic, but it's useful to have for other kinds of design.
01.12.2025 06:22
๐ 0
๐ 0
๐ฌ 1
๐ 0
4. Process - more specific breakdown of the actual task (in this case that's outlining each specific strip)
5. Ideas - in this case that would be the characters themselves
6. Guidelines - for us that's style guidelines
01.12.2025 06:22
๐ 0
๐ 0
๐ฌ 1
๐ 0
Obviously this is all new, but currently my design specs are structured as:
1. Background - writing, reasoning, definitions, etc.
2. Primary Task - what's the overarching objective?
3. Audience - who is this for? What is the intended outcome?
then comes more specific parts:
01.12.2025 06:22
๐ 0
๐ 0
๐ฌ 1
๐ 0
Before rushing to hook up Opus and Nano in an endless loop that burns tokens, it's worth functioning as the ferry-agent in this loop manually.
Go to Opus with the results, ask for updated specs (or addendums), and go back to nano with the specs.
01.12.2025 06:22
๐ 0
๐ 0
๐ฌ 1
๐ 0
The same thing applies to frontend design. The loop of
Generate from specs โ Render โ Critique โ Edit specs โ Regenerate works extremely well with Opus. For fun, you can also throw in Nano to generate out-there-undesignable-but-cool frontends to remix from.
01.12.2025 06:22
๐ 0
๐ 0
๐ฌ 1
๐ 0
Text as the intermediate makes the designs so much more editable. The same specs produce the same results, and changing something - at least for me - has a predictable effect.
01.12.2025 06:22
๐ 0
๐ 0
๐ฌ 1
๐ 0
It's also amazing at writing specs. The same plan->spec->build->review->spec workflow we've been using for code works *perfectly* for design, with Opus as the planning model and Banan as the executor.
Sorry - next strip coming up!
01.12.2025 06:22
๐ 0
๐ 0
๐ฌ 1
๐ 0
This entire strip (and others like it) were made from an OpusBanana collaboration.
Turns out Opus is now miles ahead of even Gemini at visual understanding. This is a model that can pick out and critique emotional impact, while noticing elements 10 pixels out of place.
01.12.2025 06:22
๐ 0
๐ 0
๐ฌ 1
๐ 0
Opus 4.5 + Nanobanana make for crazy design partners.
Opus is amazing at visual review and making very, VERY detailed specs, and banan is good at following them.
Let me tell you two stories.
01.12.2025 06:22
๐ 3
๐ 1
๐ฌ 1
๐ 0