ep 03 field notes
Show Us Your Agent Skills / EP 03 / guest dossier
NICOLAY GEROLD SOURCEGRAPH · AMP THREAD-POSTMORTEM DEBUG VIEWS INSTRUCTION BUDGET

NICOLAY GEROLD

Nico builds AMP, Sourcegraph's coding agent for teams and enterprises. To make it debug its own bugs, he wires the agent its own evidence: logs it can pull, Tmux sessions it can drive, a focus inspector it can read, all inside the agent's flow, so it finds the root cause before it touches the code. Generated code runs and looks correct while quietly missing a constraint, so he would rather the model prove the bug than guess it. When a failure keeps recurring, he builds a tool the agent can operate instead of writing another instruction.

EP 03 · NICOLAY GEROLD · debug views, the focus bug, and thread-postmortem, live on stream

GET THE AGENT THE EVIDENCE

logs, Tmux, and debug views inside the agent's own flow

AMP is a distributed system: an engine, a proxying server, sandboxes, and the local CLI. To debug one problem the agent has to aggregate logs across all of them. The local codebase stores the running logs and the root AGENTS.md tells the agent where they live, so it can reproduce an issue, read the logs, and judge whether the reproduction actually worked.

When a failure keeps recurring, Nico gives the agent a way to see it rather than another paragraph of instructions. Storybooks, debug panes, a focus inspector: "it's often better to actually give it tools or ways where it can actually really debug it and figure out what's going on." That removes a whole class of mistake, guessing the root cause, and leaves the human reviewing design and placement.

An AMP thread reproducing a focus bug in Tmux and inspecting the focus tree
"It realized, the focus tree isn't being set correctly." The agent reproduced the focus bug in Tmux, then read it off the inspector Nico added to the UI. [01:53:44]

THE SKILLS HE SHOWED

a capability plus the local setup the agent would otherwise rediscover every run
skill

G Cloud skill

Tells the agent how to talk to Google Cloud and how AMP's services are laid out: service names, Cloud Run, Kubernetes, where the logs live. It stops rediscovering the system on every run.

skill

Tmux skill

His favorite. Drives a dev build of AMP's CLI inside Tmux: sends text, fires keyboard shortcuts, reads logs, and iterates until it reproduces a bug.

skill

thread-postmortem

Introspects a thread that went sideways, traces each misstep to the instruction behind it, and proposes edits biased toward deletion.

"Validate that it works, but also validate that the assumption the model is making about what the problem is, if you give it a bug, is also correct."

The wrong root cause is the bigger failure: a downstream symptom can masquerade as the bug, so the model needs a way to see the cause before it writes code. 01:54:59

REPRODUCE, INSPECT, VERIFY

the amp threads continue focus bug, start to finish
Reproduce it in TmuxNico's Tmux skill, "one of my favorite skills," spins up a dev build of AMP's CLI, sends text and keyboard shortcuts, and the agent reproduces the thread picker that wouldn't take focus. 01:52:34
It repros but can't see focus"It can reproduce it. Yes, this was the first step." The agent triggers the bug reliably, then stalls: it can't locate where keyboard focus is actually going. 01:53:07
Add a debug view to the UI"What I did is I added a focus inspector to the UI where it can look at the focus tree." Now the agent reads focus state directly instead of reasoning about it blind. 01:53:21
Diff the focus treeThe agent compares the focus tree before and after the keyboard shortcut and finds it isn't being set correctly. 01:53:44
Push and re-verify in Tmux"Push the fix, try it again in Tmux, whether it's correct now." The fix gets confirmed in the same loop that found the bug. 01:53:57

DEBUG THE INSTRUCTIONS, NOT THE CODE

thread-postmortem, the skill he runs when a thread goes sideways

When a thread goes wrong, Nico turns the failed conversation into evidence. The skill reads the thread, finds each moment of friction, and traces it to its source: the system prompt, a tool definition, a loaded skill, or an AGENTS.md file. It sorts each one into a failure type (steering, undo, repetition, confusion, wrong tool, missing context, under-agentic, over-agentic) and proposes a concrete fix.

The choice that makes it useful is the bias toward deletion. "I tell it to always default to removing because agents always like to add new stuff instead of removing things." It is written for AMP, so it assumes the read_thread tool and ~/.config/amp/AGENTS.md. Swap in your own tool names and paths and it carries to any harness; read it first, then make it yours.

Nicolay Gerold demonstrating the thread-postmortem skill on stream
"A surprising amount of failures can be found because it's a wrong instruction." Outdated docs and stale instructions in the codebase had been sending the agent wrong. [01:59:40]

"The main failure pattern is because you have too many instructions."

So he keeps AMP's system prompts lean and leaves instruction budget for the user; every line has to earn its place. 02:04:37