Pamela Fox (@pamelafox.fosstodon.org.ap.brid.gy)

Learnings from the PyAI conference I recently spoke at the PyAI conference, put on by the good folks at Prefect and Pydantic, and I learnt so much from the talks I attended. Here are my top takeaways from the sessions that I watched: ### AI Evals Pitfalls Hamel Husain * View slides * Hamel cautioned against blindly using automated evaluation frameworks and built-in evaluators (like helpfulness and coherence). * Instead, we should adopt a data science approach to evaluation: explore the data, discover what's actually breaking, identify the most important metric, and iterate as new data comes in. * We shouldn't just trust an LLM-as-a-judge to be given accurate scores. Instead, we should validate it like we would validate a ML classifier- with labeled data, train/dev/test splits, and precision/recall metrics. LLM-judges should always give pass/fail results, instead of 1-5 scores, so that there's no ambiguity in their judgment. * When generating synthetic data, first come up with dimensions (such as persona), generate combinations based off dimensions, and convert those into realistic queries. * Hamel created evals-skills, a collection of skills for coding agents that can be run against evaluation pipelines to find issues like poorly designed LLM-judges. ### Build Reasonable Software Jeremiah Lowin (FastMCP/Prefect) * Write your Python programs in a way that coding agents can reason about them, so that they can more easily maintain and build them. For example, FastMCP v2 SDK was not well designed (bad abstractions) so a new CodeMod feature required 4,000 lines of code. In the new FastMCP v3 SDK (same functional API, different abstractions backing it), the same feature only required 500 lines of code. * To make Python FastMCP servers more Pythonic, Jeremiah is developing a new package for MCP apps which includes the most common UIs (forms/tables/charts), called PreFab: https://github.com/PrefectHQ/prefab ### Panel: Open Source in the Age of AI Guido van Rossum (CPython), Samuel Colvin (Pydantic), Sebastián Ramírez (FastAPI), Jeremiah Lowin (FastMCP) * OSS maintainers are overwhelmed by AI Slop PRs. As one maintainer said, "Don't expect someone else to be the first one to read your code". Each maintainer is coming up with different systems/bots/heuristics to detect and triage PRs (like FastMCP auto-rejects PRs that are too long!). Some maintainers are going to turn off PRs entirely, as now permitted by GitHub. * Samuel's opinion: GitHub should add a "human identity" vs "user identity", as well as a user reputation system where reputation is based off how many useful contributions you've made (or a "sloppiness" metric). ### Do developer tools matter to agents? Zanie Blue (Astral) * Astral is considering ways to make their tools more agent-friendly. For example, their error messages for ty are currently fairly long and include ASCII arrows pointing to the code in question, and they suspect the agents may not need all of that in their context. * Astral is also re-prioritizing based off the move towards 100% agentic coding, with less emphasis on tools that would be used solely by a developer who is manually typing. For example, they were once considering adding a "review" feature to review each ruff suggestion one-by-one, but that seems unlikely to be used by developers these days. * Astral may now be able to take advantage of agent's ability to reason over whether proposed ruff fixes are safe. Currently, ruff only auto-fixes code when it knows that the code change can't introduce any unwanted changes (like comment deletions), and it marks other fixes as "unsafe". Now ruff could add more unsafe fixes, knowing that an LLM could decide whether it was actually a safe change. ### Context Engineering for MCP Servers Till Döhmen (MotherDuck) * Till walked through the multi-step process of developing MCP servers to allow developers to interact with their MotherDuck databases. The server started with a single "query" tool, which later split into multiple tools, including "list_databases" and "list_tables". They had to offer dedicated schema-exploration tools since DuckDB uses a different syntax than PostgreSQL, and the agents kept suggesting PostgreSQL syntax that didn't work. * They also added a tool to search the documentation (powered by the same search used by their website) and a tool that teaches the agent how to create "dive"s, a visualization of the database state. * One of their big struggles is the lack of MCP spec support across clients: the MCP spec is so rich and full of features, but only a handful of clients support those features. It's hard for them to take advantage of the new features, knowing their users may be using a client that does not support them. ### Controlling the wild: from tool calling to computer use Samuel Colvin (Pydantic) * Samuel built Monty to be a minimal implementation of Python for agents to use. It intentionally does _not_ support all of the Python standard lib (like sockets/file open), but does include a way to call back to functions on the host. When using monty, you do _not_ need to setup a separate sandbox. * Monty is _not_ designed to run full applications - it's designed to run Python code generated by agents. * The models vary in how successfully they call monty in a REPL loop- Opus 4.5 works the best, Opus 4.6 works worse, presumably due to the RLHF process teaching 4.6 to execute code in a particular way. * github.com/pydantic/monty ### What's new in FastAPI for AI Sebastián Ramírez (FastAPI) * There's now a VS Code extension for FastAPI, built by my brilliant former colleague, Savannah Ostrowski. It makes it easy to navigate to different routes in your app, and it adds a CodeLens for navigating from pytest tests back to the route that they're testing. * FastAPI has built-in support for streaming JSON lines! Just `yield` an `AsyncIterable`. I plan to port my FastAPI streaming chat apps to this approach, pronto. * In `pyproject.toml`, you can now specify the FastAPI `entrypoint`, so that the `fastapi` command knows exactly where your FastAPI app is. ### Context Engineering 2.0: MCP, Agentic RAG & Memory Simba Khadder (Redis) * Redis is adding many features to specifically help developers who are creating apps with generative AI. For example, they've added a semantic caching of queries, based off a fine-tuned BERT model, so that developers don't have to pay every time someone says "good morning" to a chatbot. Anyone can use semantic caching in open-source Redis by bringing your own LLMs, but the fine-tuned model is available only for Redis Cloud.

I wrote up my learnings from the fantastic PyAI conference yesterday:
https://blog.pamelafox.org/2026/03/learnings-from-pyai-conference.html

Topics: Evals, Monty, FastAPI, MCP for DBs, Redis, FastMCP + apps, Astral tools, AI PR slop

12.03.2026 06:45 👍 2 🔁 0 💬 0 📌 0

I love ty for typechecking since it's so speedy, but its rules change a lot more rapidly than mypy, so I keep getting nightly build failures for a repo with an un-pinned ty dependency. Time to pin that ty!

10.03.2026 00:16 👍 0 🔁 0 💬 0 📌 0

Advanced Retrieval Augmented Generation (RAG) Deep Dive | Microsoft Reactor Learn new skills, meet new peers, and find career mentorship. Virtual events are running around the clock so join us anytime, anywhere!

🔴 Live streaming in 5: RAG (+ JS!)
https://aka.ms/JS/BAT/3x

06.03.2026 16:55 👍 0 🔁 0 💬 0 📌 0

🔴 Streaming in 10 mins for the final session in our series on building agents with Microsoft Agent Framework: "Adding a human-in-the-loop to workflows"
https://www.youtube.com/watch?v=7pGqASn-LEY
(Very important to include humans when LLMs are making decisions!)

05.03.2026 18:23 👍 0 🔁 0 💬 0 📌 0

Webcam pic of squirrel

Also the occasional... SQUIRREL!

02.03.2026 22:06 👍 0 🔁 0 💬 0 📌 0

Feed of 3 images of birds at webcam

I was gifted a webcam-augmented birdfeeder, and it's brought so much joy to my life to see the happy birdies frequenting it.

(In this feed: Titmouse, Scrub jay, House finch)

02.03.2026 22:04 👍 0 🔁 0 💬 1 📌 0

Graph comparing system prompts

Great post that breaks down the system prompts across 6 coding agents:
https://www.dbreunig.com/2026/02/10/system-prompts-define-the-agent-as-much-as-the-model.html

(Now I want to do a breakdown of the GitHub Copilot prompt-
I'm always using Chat Debug view in VS Code to check out what it sends)

02.03.2026 19:49 👍 0 🔁 2 💬 0 📌 0

First rows of bullshitbench with results

BullshitBench: a benchmark that measures whether models detect nonsense, call it out clearly, and avoid confidently continuing with invalid assumptions.
https://github.com/petergpt/bullshit-benchmark

02.03.2026 16:57 👍 0 🔁 3 💬 0 📌 0

🔴 Live in 10 mins: "Monitoring and evaluating agents"
Part 3 in our 6-part series on building with agent-framework,
covering OpenTelemetry, azure-ai-evaluation Python SDK, and automated red teaming.
https://www.youtube.com/watch?v=3yS-G-NEBu8

26.02.2026 18:20 👍 0 🔁 0 💬 0 📌 0

🔴 Live in 10 mins: "Adding context and memory to agents"
Part 2 in our 6-part series on building with agent-framework,
covering sessions, chat history, dynamic memory, RAG, context management techniques.
https://www.youtube.com/watch?v=BMzI9cEaGBM

25.02.2026 18:20 👍 0 🔁 0 💬 0 📌 0

I taught the concept of "middleware" in yesterday's talk on agents, and realized I did a poor job teaching it -I didn't appreciate how many developers were new to the general concept of framework middleware. It's common in Python web frameworks, but maybe not for other languages?

25.02.2026 17:29 👍 0 🔁 0 💬 2 📌 0

Slide describing an agent

Yesterday I gave the first talk in our 6-part series about building agents with agent-framework. Catch up with...

Recording:
https://www.youtube.com/watch?v=I4vCp9cpsiI

Slides:
https://aka.ms/pythonagents/slides/building

Annotated write-up […]

[Original post on fosstodon.org]

25.02.2026 15:15 👍 0 🔁 0 💬 0 📌 0

Python + Agents series banner

My colleague and I are running a 6-part livestream series over next 2 weeks, in both english + español, showing how to use Microsoft's new agent-framework Python package.

It starts today, with an intro to agents: tool calling, MCP tools, and middleware.

https://aka.ms/PythonAgents/m

24.02.2026 17:31 👍 0 🔁 0 💬 0 📌 0

RE: https://fosstodon.org/@pamelafox/116087011168369135

The agent-framework livestream series starts tomorrow!
Hope to see you in the YouTube live chat or Discord office hours after.

24.02.2026 00:43 👍 1 🔁 0 💬 0 📌 0

Pamela as speaker for Posette

I'll be speaking at Posette in June!
https://posetteconf.com/2026/

It's a free virtual conference about PostgreSQL, organized by Microsoft colleagues.
My talk will be about Python + MCP + Postgres. See you in the streams!
#PosetteConf

23.02.2026 20:40 👍 0 🔁 1 💬 0 📌 0

My 6 year old claims that she doesn't have time to make friends at school, because she needs recess time for wandering around daydreaming, and she prefers reading during community time. So... introvert? Is it okay to not want to have friends? She seems happy, so I guess so?

23.02.2026 18:44 👍 0 🔁 0 💬 0 📌 0

Principal Developer Advocate | Microsoft Careers Work hands-on to improve and refine our libraries and the developer experience as a whole. Craft compelling technical narratives that show how our database platform enhances workflows of database administrators and productivity of application developers especially when building Generative AI agents. Organize and deliver live presentations, workshops, and webinars to drive awareness and practical adoption. Create technical content (tutorials, sample apps, blog posts, videos) that demonstrates real-world uses and best practices for the company's platform or APIs. Maintain and contribute to developer-facing documentation, SDKs, command-line tools, and code examples to improve onboarding and reduce friction. Act as the primary liaison between the developer community and engineering team; surface feature requests and usability issues. Advocate for developer-first design inside the team by insisting on clarity, tooling, and workflows that reduce cognitive load and friction. Measure and report on developer metrics (adoption, retention, active usage, contributions) and craft narratives to show business impact. Bachelor's Degree ANDnexperience in product/service/program management or software development OR equivalent experience. Experience in database management Postgres database expertise Postgres performance analysis and optimization expertise GenAI/ML expertise, including hands-on AI app development and deployment Front-end frameworks and component libraries experience Generative AI agent frameworks expertise API design and consumption Building and publishing SDKs and client libraries Sample app architecture and full-stack demo construction Debugging tools and troubleshooting production issues Git workflows and open-source contribution patterns Containerization (Docker) and local orchestration Azure cloud platform fundamentals CI/CD pipelines and release automation Infrastructure as Code (Terraform) Instrumentation and monitoring basics (logs, metrics, traces) Command-line interface (CLI) tooling and scripting Package management and registry publishing (npm, PyPI) Security basics (auth flows, tokens, OAuth, API keys) Testing strategies for example code (unit/integration)

The PostgreSQL team at Microsoft is hiring for a developer advocate, remote-friendly:
https://apply.careers.microsoft.com/careers/job/1970393556753261
I've worked with that team a bunch, and it's a good group of intelligent and friendly people.

23.02.2026 17:03 👍 1 🔁 1 💬 0 📌 0

@dain lol they are so bulbous!

20.02.2026 19:23 👍 1 🔁 0 💬 0 📌 0

Chart of what I removed

TreeMap of my HD space

I ran out of HD space on my Mac last night, so I downloaded GrandPerspective. I discovered 86 GB of files I could easily delete - old giant repos, cached SLMs, playwright browser binaries, diagnostic dumps, excessively large git LFS objects, etc.

20.02.2026 19:09 👍 0 🔁 0 💬 1 📌 0

Screenshot of codespeak spec

CodeSpeak is a new "language" from the Kotlin creator, where you write a spec describing the program, and it generates the program.
https://codespeak.dev/

I can see this working for utilities/pure functions, but I have a hard time imagining it for full-stack app.

18.02.2026 23:58 👍 0 🔁 1 💬 0 📌 0

PyCon US 2026 PyCon US 2026

I got a tutorial accepted for PyCon US 2026!
"Build your first MCP server in Python"

Hope to see lots of you in Long Beach in May 🌊
https://us.pycon.org/2026/

18.02.2026 18:40 👍 1 🔁 0 💬 0 📌 0

“I Have Been Here Too Long”: Letters from the Children Detained at ICE’s Dilley Facility
https://www.propublica.org/article/ice-dilley-children-letters

17.02.2026 21:20 👍 0 🔁 0 💬 0 📌 0

Python + AI Weekly Office Hours: Recordings & Resources · microsoft-foundry · Discussion #280 Each week, we hold weekly office hours about all things Python + AI in the Foundry Discord. Join the Discord here: http://aka.ms/aipython/oh This thread will list the recordings of each office hour...

Running Python + AI Office Hours right now in the Microsoft Foundry Discord. Join us!
https://aka.ms/pythonai/oh/links

17.02.2026 19:01 👍 1 🔁 0 💬 0 📌 0

AGENTS.md AGENTS.md is a simple, open format for guiding coding agents. Think of it as a README for agents.

This is why I do not simply ask an LLM to write AGENTS.md files. Instead, I wait to see where a coding agent struggles and *then* add to http://AGENTS.md. To gain even more confidence, I revert code changes, re-run prompt, and see if output has improved.

17.02.2026 18:22 👍 0 🔁 0 💬 0 📌 0

Diagram of evaluation pipeline from paper

A new research paper evaluating https://AGENTS.md files finds that they often fail to improve task success rates:
https://arxiv.org/pdf/2602.11988

Discussion on HN:
https://news.ycombinator.com/item?id=47034087

17.02.2026 18:22 👍 0 🔁 0 💬 1 📌 0

Evaluating AGENTS.md: are they helpful for coding agents? | Hacker News

A new research paper evaluating AGENTS.md files finds that they often fail to improve task success rates:
https://arxiv.org/pdf/2602.11988

Discussion on HN:
https://news.ycombinator.com/item?id=47034087

17.02.2026 18:18 👍 0 🔁 0 💬 0 📌 0

Python + Agentes: Creando agentes y flujos de IA con Agent Framework | Microsoft Reactor Attend Reactor Event Series for on-going opportunities to learn, connect, and build. Expand your skillset.

¿Hablas Español?
Mi colega Gwyneth Peña-Siguenza ofrece una serie en español. Regístrate aquí:
https://aka.ms/PythonAgentes/m

17.02.2026 17:04 👍 0 🔁 0 💬 0 📌 0

Python + Agents: Building AI agents and workflows with Agent Framework | Microsoft Reactor Attend Reactor Event Series for on-going opportunities to learn, connect, and build. Expand your skillset.

Starting in one week!
Our 6-part Python + Agents livestream series will show how to build AI agents with Microsoft Agent Framework:

🛠️ Tools
🧠 Memory & RAG
🔍 Eval & observability
🔁 Workflows
🙋 HITL

🗓 Feb 24 – Mar 5, 10:30AM PT
🔗 https://aka.ms/PythonAgents/x

17.02.2026 16:53 👍 0 🔁 0 💬 1 📌 1

Harald in front of MCP apps

Want to get started with MCP apps? Check out these examples from Harald Kirschner:

https://github.com/digitarald/mcp-apps-playground

(They're Typescript, maybe I'll port them to Python someday, but you get the idea)

The flame graph is 🔥!

12.02.2026 07:01 👍 2 🔁 1 💬 0 📌 0

@diazona oh interesting, good to know why one might stick with the python version

lol im not cool yet either dont worry

10.02.2026 23:16 👍 0 🔁 0 💬 1 📌 0

Pamela Fox

Latest posts by Pamela Fox @pamelafox.fosstodon.org.ap.brid.gy