's Avatar

@itsalexzajac

37
Followers
4
Following
3,071
Posts
27.04.2025
Joined
Posts Following

Latest posts by @itsalexzajac

Are you tracking inference cost per request — or just watching the monthly bill?

10.03.2026 15:32 👍 0 🔁 0 💬 0 📌 0

If you're building agent systems, here's the playbook:
↳ Single-responsibility agents with clean interfaces
↳ Design for model tiering from day one
↳ Track cost per request not just monthly spend

That per-request number is what tells you if your architecture survives at scale

10.03.2026 15:32 👍 0 🔁 0 💬 1 📌 0

Fit the model to the stakes of the decision.

Not every agent needs GPT-4.

10.03.2026 15:32 👍 0 🔁 0 💬 1 📌 0

► Costs:

3 agents = 3× the inference budget per request.
At Uber's volume, that compounds fast.

Their solution: model tiering.
→ Lighter models on demand forecasting (runs constantly, errors are recoverable)
→ Heavier models on assignment (wrong answer = bad UX immediately)

10.03.2026 15:32 👍 0 🔁 0 💬 1 📌 0

► Each agent is replaceable independently:

Agents 1 and 2 don't need to be right 100% of the time.

They just need to be right often enough to improve Agent 3's matching quality.

If upstream signals are low-confidence?

Agent 3 falls back to simpler heuristics.

10.03.2026 15:32 👍 0 🔁 0 💬 1 📌 0

🤖 Agent 3 → Assignment

Bipartite graph matching with ETA prediction.

Agents 1 and 2 feed directly into its cost function.

This is where the ride meets the driver.
Connected through clean function-calling interfaces.

10.03.2026 15:32 👍 0 🔁 0 💬 1 📌 0

🤖 Agent 2 → Pricing

Real-time market clearing.

Takes the demand signal from Agent 1.
Adjusts surge pricing before the imbalance hits.

10.03.2026 15:32 👍 0 🔁 0 💬 1 📌 0

🤖 Agent 1 → Demand Forecasting

Time-series ML predicting surge zones 15 minutes ahead.

Not reactive. Predictive.
It doesn't care where riders are. It cares where they're going.

10.03.2026 15:32 👍 0 🔁 0 💬 1 📌 0

Uber runs 20M+ trips daily.

An 8% ETA improvement isn't a rounding error.

It's the driver's earnings.
Fewer cancellations.
Rider satisfaction.

Here's the system that produced it:

10.03.2026 15:32 👍 0 🔁 0 💬 1 📌 0
Post image

I studied how Uber matches 20M rides per day
(so you don't have to):

10.03.2026 15:32 👍 0 🔁 0 💬 1 📌 0

What are you reading this week?

09.03.2026 15:32 👍 0 🔁 0 💬 0 📌 0
Preview
Subscribe | Hungry Minds Get smarter about Software and AI in 5 minutes. Save 50+ hours/week with deep dives, trends and tools hand-picked from 100+ sources. Join 50K+ engineers from big tech to startups for 1 free email every Monday.

Unlock all 10 links + my notes: newsletter.hungryminds.dev

You found this scrolling.
50k+ engineers receive it free every Monday.

09.03.2026 15:32 👍 0 🔁 0 💬 1 📌 0

9. Build Your AI Agent The Right Way (Most Teams Don't):
►🔒 In today's Hungry Minds issue

09.03.2026 15:32 👍 0 🔁 0 💬 1 📌 0

8. How Balyasny Built An AI Research Engine That Actually Works For Investing:
►🔒 In today's Hungry Minds issue

09.03.2026 15:32 👍 0 🔁 0 💬 1 📌 0

7. Apple's New Approach To Catching LLM Hallucinations At The Span Level:
►🔒 In today's Hungry Minds issue

09.03.2026 15:32 👍 0 🔁 0 💬 1 📌 0

6. How Databricks Uses LLMs To Detect PII At 92% Precision Across Every Log:
►🔒 In today's Hungry Minds issue

09.03.2026 15:32 👍 0 🔁 0 💬 1 📌 0

5. How A 12-Word GitHub Issue Title Owned 4,000 Developer Machines:
►🔒 In today's Hungry Minds issue

09.03.2026 15:32 👍 0 🔁 0 💬 1 📌 0
Preview
I Struggled With System Design Interview Until I Learned This Framework #128: From Someone Who Failed and Then Passed

4. The System Design Interview Framework I Wish I Had Before I Failed:
newsletter.systemdesign.one/p/how-to-pr...

09.03.2026 15:32 👍 1 🔁 0 💬 1 📌 0
Preview
Defeating the deepfake: stopping laptop farms and insider threats Cloudflare One is partnering with Nametag to combat laptop farms and AI-enhanced identity fraud by requiring identity verification during employee onboarding and via continuous authentication.

3. Defeating The Deepfake: How Cloudflare Is Stopping Laptop Farms And Insider Threats:
blog.cloudflare.com/deepfakes-i...

09.03.2026 15:32 👍 0 🔁 0 💬 1 📌 0
Preview
Boris Tane Boris Tane

2. The Research-Plan-Implement Workflow That Stops Claude Code From Writing Bad Code:
boristane.com/blog/how-i-...

09.03.2026 15:32 👍 0 🔁 0 💬 1 📌 0
Preview
Google quantum-proofs HTTPS by squeezing 15kB of data into 700-byte space Merkle Tree Certificate support is already in Chrome. Soon, it will be everywhere.

1. Google Quantum-Proofs HTTPS By Squeezing 15kB Into 700 Bytes:
arstechnica.com/security/20...

09.03.2026 15:32 👍 0 🔁 0 💬 1 📌 0
Preview
Tracing Discord's Elixir Systems (Without Melting Everything) Join Senior Software Engineer Nick Krichevsky as he explains how Discord added distributed tracing to Elixir's message passing and optimized it to handle millions of concurrent users.

0. How Discord Added Distributed Tracing To Elixir Without Breaking Anything:
discord.com/blog/tracin...

09.03.2026 15:32 👍 0 🔁 0 💬 1 📌 0
Post image

You are what you eat.

10 brain foods to grow as an engineer:

09.03.2026 15:32 👍 0 🔁 0 💬 1 📌 0

What do you think?

08.03.2026 15:28 👍 0 🔁 0 💬 1 📌 0

The goal isn't to be clever about your stack. The goal is to be predictable.

08.03.2026 15:28 👍 0 🔁 0 💬 1 📌 0

Latency-sensitive, memory-constrained, high-concurrency → Go or Rust.
Team velocity, broad tooling, API work → TypeScript.
Data at scale → Polars, DuckDB, or a proper inference runtime.

08.03.2026 15:28 👍 1 🔁 0 💬 1 📌 0

The pattern: match the runtime to the constraint.

08.03.2026 15:28 👍 0 🔁 0 💬 1 📌 0

🔥 Lightweight scripting and data transforms

Surprising pick: TypeScript (Bun/tsx), not Python or Rust.
TypeScript is fast to write, fast to run with Bun, and you get type safety on your data shapes. Rust is overkill. Python works, but you're already writing TS everywhere anyway

08.03.2026 15:28 👍 2 🔁 0 💬 1 📌 0

🔥 CLI tools

Default: Python → 300ms+ startup on every invocation, ugly distribution story → Go.
Go produces a single static binary. No venv, no pip install, no "works on my machine." Ship it, run it. Rust is you're familiar with it.

08.03.2026 15:28 👍 3 🔁 0 💬 4 📌 0

🔥 ML inference serving

Default: plain Python → throughput bottleneck at scale → Python for orchestration, not the hot path.
The actual serving layer should run on an inference runtime. vLLM handles continuous batching and KV cache management that raw Python never will.

08.03.2026 15:28 👍 0 🔁 0 💬 1 📌 0