ep 04 field notes
Show Us Your Agent Skills / EP 04 / guest dossier
CHRIS FONNESBECK PyMC LABS · BAYESIAN MODELING REVIEW LOOP EX YANKEES · PHILLIES

CHRIS FONNESBECK

For Chris, agents make experimentation almost limitless, and that is the bargain: the same ease that lets him try every model variant can let him skip the research and lose the judgment that makes the work trustworthy. So he runs agent work through a review loop of two skills he wrote for Pi, a stripped-down agent harness wired to its own source: review plans flags an agent's plan, review implementation flags the finished code, and nothing moves on while a red or yellow flag stands.

EP 04 · CHRIS FONNESBECK · the review-loop segment, live on stream

UNTIL THE FLAGS ARE GONE

the plan-review-implementation-review loop, run against a real PyMC task

Chris wrote two Pi skills, review plans and review implementation, and runs them as a loop: ask an agent for a plan, audit the plan, implement only once it is clean, then audit the code. He iterates each stage until no red or yellow flags remain.

review plans does not leave its notes in chat. It writes findings to a local markdown file and calls out the red and yellow flags, so the review is a durable object he can reread and hand back to the implementer.

He made the skill because he kept asking agents for the same audit in the same way. The worked example is a half-flat distribution added to PyMC's distributions repo: equal probability from zero to infinity, a real change with real APIs and tests to check.

Chris's Zed workspace running review plans over a generated PyMC implementation plan
Pointed at the generated plan, review plans reports its findings and writes them to markdown: "it will find red flags, yellow flags if they exist." [01:04:35]

"It really frees my brain to focus on creative tasks."

Agents take the boilerplate off his desk and let him try model variants he would once have skipped: "be a lot more exhaustive and use your imagination in ways that you couldn't before." 00:49:12

"Sometimes they're a little bit too seductive, almost like social media or your phone."

Lean on a strong agent before doing the thinking and "you can paint yourself into a corner because you haven't done the work. You haven't done the research yourself, and it's too easy to ask the agent for it." 00:51:12

THE FIRST ARTIFACT IS A PLAN

ask for a plan, gate it, implement, gate the code
Start with a concrete changeThe job is a specific code change, not an abstract exercise: "say I want to implement a probability distribution that doesn't yet exist inside of PyMC distributions." His pick is the half-flat distribution. 01:02:55
Ask for a plan, not editsThe first agent produces a plan file before touching code. "You start by asking your agent to generate a plan to do that." 01:03:27
Run review plans over itThe skill audits the plan, presents findings, and writes them to a local markdown file with red and yellow flags marked. The half-flat plan came back mostly clean. 01:04:35
Hand the reviewed plan to an implementerThe markdown review becomes input to the build stage: "ask the original agent to implement it," or switch to another instance when that fits. 01:05:06
Route review and build by tasteHe picks models on the fly, splitting roles across them: DeepSeek for the critical read, Qwen to write the code. "Kimi is a really good coding agent. DeepSeek is a little bit better at data science tasks." 01:07:24
Review the implementation tooThe plan passing does not excuse the code. review implementation runs the same audit over the finished work: "do exactly the same thing with review implementation." 01:05:53
Chris's Zed workspace with several agent sessions docked on the left and a PyMC file open
Several agents run in different projects at once down the left of Zed: "more of an AI agent multiplexer" than an editor, and he flips between them as work finishes. [01:01:52]
Chris invoking the review implementation skill over finished code in Zed
After the code is written, review implementation audits it against the plan and the project's standards before he accepts the work. [01:05:45]

"I'm interested in a bespoke AI experience that is tailored to the way that I work."

He has moved off downloading public skills and now writes them himself: "more and more I'm avoiding skills that other people are publishing on skills.sh and essentially just writing them myself," because loading everything costs context and goes stale. 00:55:42

ASK PI, SAVE TO CUTIE-PI

the skills he asked Pi to write for himself, kept current in cutie-pi
the plan gate

review plans

Audits a plan file before any code is written, writes red and yellow flags to a local markdown report, and feeds that report back to the implementation agent.

the code gate

review implementation

Runs the same audit over the finished code. He loops it until no red or yellow flags remain, so passing plan review does not let the implementation through unchecked.

clarify first

Socratic Review

His local take on Matt Pocock's grill me: the agent asks questions until the plan is clear, "to actually have a back and forth with it to clarify uncertainties" before it builds.

from claude code

/simplify

A command he used constantly in Claude Code, rebuilt in Pi. A generated model can work without being efficient, so he slims it down "to make sure that it's something that you're proud of."

patches a knowledge gap

PyMC modeling skill

The internet is full of PyMC 3, and models trained on it lag new releases. A skill carries current PyMC 6 guidance: "there's really nothing yet on PyMC 6 or more recent features."