Hrishi's Avatar

Hrishi

@olickel.com

Previously CTO, Greywing (YC W21). Building something new at the moment. Writes at https://olickel.com

780
Followers
1,327
Following
178
Posts
15.09.2023
Joined
Posts Following

Latest posts by Hrishi @olickel.com

Formalizing a proof in Lean using Claude Code
Formalizing a proof in Lean using Claude Code I revisit the formalization task I did nine months ago in https://www.youtube.com/watch?v=cyyR7j2ChCI , but this time use a recent version of Claude Code to ...

We can also keep results on track by checking for stuck, looping, laziness, etc with monitor agents we call Sentinel.

Disclaimer: Not a mathematician, so if there are ways to improve this or I went egregiously wrong somewhere - let me down easy haha

Original video: www.youtube.com/watch?v=JHE...

11.03.2026 02:12 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

This is all the hank does - the abstraction also means that we can clear context often and with good precision, switch between harnesses (codex in codex, haiku in claude code) within the same run.

11.03.2026 02:12 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Clone and run it for yourself with

bunx hankweave hank.json <input_folder_of_proof_to_formalize>

The way it works is pretty simple: Take an informal proof, clean it up into a template, and slowly chip away at it.

11.03.2026 02:12 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Turns out you can formalize his agentic algorithm into a simple hankweave program (couple lines of json), with sentinels and loops to improve reliability. Haiku was able to formalize the same proofs (and harder ones) for less than a dollar.

11.03.2026 02:12 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

This is the same insight that led us to build hankweave for ourselves six months ago. Since then almost all of our work has been frozen in hanks (repeatable AI programs) - some for design, some for building code, moving data - so why not one for math?

11.03.2026 02:12 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

My biggest thought when watching the video was that all of us keep rediscovering the same patterns out of necessity: agents work better if you can structure their results into verifiable checkpoints, and then patiently step them through it.

11.03.2026 02:12 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Preview
GitHub - hrishioa/tao-formalization-hank: Fun AI program for converting informal proofs to formalizations in Lean. Fun AI program for converting informal proofs to formalizations in Lean. - hrishioa/tao-formalization-hank

I thought I'd take a crack at turning Terence Tao's @teorth.bsky.social method for formalizing informal proofs with Claude Code into an automated hank that runs by itself, on haiku, for less than a dollar per proof - and open-sourcing it: github.com/hrishioa/ta...

11.03.2026 02:12 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Preview
Hankweave - A runtime for repairable agents Open-source runtime for maintainable, long-horizon AI agents. Repair instead of rebuild.

You wouldn't vibe a car
You wouldn't vibe a house

You wouldn't vibe the important agents in your life and work

southbridge.ai/hankweave

20.02.2026 18:19 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Preview
Systems of Lasting Value In a few days we'll open-source something that means a lot to us. Why does all code become legacy code, and what can we do about it in the age of AI?

Join us in reducing agent waste by championing your right to repairable agents - with Hankweave

www.southbridge.ai/blog/system...

17.02.2026 17:12 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

It's a point of pride that we have hanks that are six months old - with all the warts and learnings of doing real tasks, where we've collaborated with external partners to incorporate their knowledge into hanks that can be reused.

17.02.2026 17:12 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Hankweave treats agent harnesses (CC, Codex, etc) as the primitive instead of the LLM call. Hanks can swap between claude in agents sdk, codex in codex or opencode or gemini - whatever works best, with a unified portable abstraction that makes hanks reusable and modular.

17.02.2026 17:12 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Claude Code becomes the REPL in a way - you experiment and figure out what's possible, then freeze that behavior inside reusable blocks and sequence those blocks into programs that hankweave can execute

Loop those blocks for crazy new patterns

17.02.2026 17:12 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Basic principles:

Hankweave runs "hanks" - sequenced AI programs combining prompts, code, rigs, loops, monitor agents called Sentinels, etc

hanks are hard to write but simple and easy to debug. even as they get complex

Think skills but way more powerful and surgical

17.02.2026 17:12 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Preview
Southbridge AI Research and writing on AI systems, agentic architectures, and the future of intelligence.

How do we stop throwing away our agents, leave greenfield behind, and keep complex systems maintainable?

Forget skills & mcps, how do we REMOVE things from context?

How do we build reliable agents that can survive being foundational infrastructure?

southbridge.ai/hankweave

17.02.2026 17:12 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Labor of love: We're open-sourcing the runtime we use to run long-horizon agents at Southbridge.

Something like this exists at almost every serious AI team I know.
We ended up needing to build it because we couldnโ€™t buy it.

`bunx hankweave` gives you a fun demo.

17.02.2026 17:12 ๐Ÿ‘ 3 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

But also thank you @exhaze for voicing the unvoiced

03.01.2026 03:12 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

I think this is what my friends have been trying to tell me

I'm sorry all

03.01.2026 03:12 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Preview
Antibrittle Agents How to build reliable long-horizon AI agents that can work for hours or days without breaking. A guide to the architectural principles and practices that make agents antibrittle.

www.southbridge.ai/blog/antibr...

30.12.2025 17:56 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Disclaimer: I *know* model X and harness Y have been able to do these things before, but this feels like a genuine upgrade in all my testing - like this is the first time the model can actually see.

YMMV though - happy Thanksgiving!

01.12.2025 06:22 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

I think we've just scratched the surface on what's possible.

This might be the start of us actually being able to talk to models with images, conveying a lot more than what's been possible before.

01.12.2025 06:22 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Opus 4.5 still amazes me, that Anthropic in a single release moved from models that could sort-of understand pictures to something that actually knows what its looking at, and (from my testing) the best model for visual understanding by far.

01.12.2025 06:22 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

What's amazing is also that these can now be collaborated on, and version controlled. You probably don't need this for a comic, but it's useful to have for other kinds of design.

01.12.2025 06:22 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

4. Process - more specific breakdown of the actual task (in this case that's outlining each specific strip)
5. Ideas - in this case that would be the characters themselves
6. Guidelines - for us that's style guidelines

01.12.2025 06:22 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Obviously this is all new, but currently my design specs are structured as:
1. Background - writing, reasoning, definitions, etc.
2. Primary Task - what's the overarching objective?
3. Audience - who is this for? What is the intended outcome?

then comes more specific parts:

01.12.2025 06:22 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Before rushing to hook up Opus and Nano in an endless loop that burns tokens, it's worth functioning as the ferry-agent in this loop manually.

Go to Opus with the results, ask for updated specs (or addendums), and go back to nano with the specs.

01.12.2025 06:22 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

The same thing applies to frontend design. The loop of
Generate from specs โ†  Render โ†  Critique โ†  Edit specs โ†  Regenerate works extremely well with Opus. For fun, you can also throw in Nano to generate out-there-undesignable-but-cool frontends to remix from.

01.12.2025 06:22 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Text as the intermediate makes the designs so much more editable. The same specs produce the same results, and changing something - at least for me - has a predictable effect.

01.12.2025 06:22 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

It's also amazing at writing specs. The same plan->spec->build->review->spec workflow we've been using for code works *perfectly* for design, with Opus as the planning model and Banan as the executor.

Sorry - next strip coming up!

01.12.2025 06:22 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

This entire strip (and others like it) were made from an OpusBanana collaboration.

Turns out Opus is now miles ahead of even Gemini at visual understanding. This is a model that can pick out and critique emotional impact, while noticing elements 10 pixels out of place.

01.12.2025 06:22 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Opus 4.5 + Nanobanana make for crazy design partners.

Opus is amazing at visual review and making very, VERY detailed specs, and banan is good at following them.

Let me tell you two stories.

01.12.2025 06:22 ๐Ÿ‘ 3 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0