ep 04 field notes
Show Us Your Agent Skills / EP 04 / guest dossier
HAMEL HUSAIN PARLANCE LABS · AI EVALS SKILL-SCEPTICISM EX AIRBNB · GITHUB

HAMEL HUSAIN

Hamel reads a shared agent skill before he trusts it: open the prompt, check whether the author still ships it, look for constraints that aren't just prose. Most public skills don't survive that read. The good ones carry real expertise anyway, and they only earn it once you've read them, his own bundle included.

EP 04 · HAMEL HUSAIN · the skill-scepticism segment, live on stream

FUCK YOUR SKILLS

the skill-scepticism argument, his own eval bundle on blast first

Hamel is the eval guy: a course, thousands of hours of office hours and Q&A, 4,500+ people taught to measure LLM systems. He distilled the recurring problems into a published bundle of eval skills, then aimed his own critique at it first.

That bundle couldn't hold the real material. Multimodal evals, PII, golden datasets, human-calibrated judges: the nuance a student actually needs flattens into a markdown checklist.

So the answer he landed on isn't a skill at all. It's a course chatbot and an MCP server that search the source directly, so an agent gets the higher-fidelity answer instead of a summary. The rule that falls out cuts both ways: read it, check who maintains it, look for real constraints, his own skills included.

Hamel's title slide reading Fuck Your Skills with the subtitle And mine too
His own bundle is the example, not the exception: "you must be doing it right because you have Hamel's eval skill." [00:22:05]

"I ended up hating my own skill."

He pulled the bundle and decided shipping it had steered people wrong: "publishing these skills may be leading lots of people in the wrong direction." 00:24:53

JUDGE IT LIKE CODE

the loop he runs before a shared skill enters his harness
Open the source, not the previewRead the SKILL.md, the prompts, the linked scripts and tools, not the rendered markdown. "Look at the prompt and look at the skill." 00:21:08
Check who maintains itWho wrote it, do you trust them, and are they still using it. "Look at the commit history, the age of the skill." 00:27:49
Treat one commit as a smellNot a conviction, a comedy skill can need one. "If it's not being iterated on, you have to tune up your skepticism a little bit." 00:29:22
Prefer constraints over proseA single prompt in one file is weak evidence. Code, tools, and scripts show someone found a constraint worth packaging. "The more constraints your skill imposes, it's some signal that's good." 00:28:31
Point the agent at richer sourcesWhen the knowledge is deep, a blog post you already wrote beats squeezing it into a skill file. "I often end up pointing my agent at just my own blog post." 00:38:46
Install only the adapted versionFork the idea, rewrite the instructions for your context, keep the taste. "The sharing part is definitely very, very good." 00:35:01
Hamel showing skills.sh data, sorting public skills by downloads and counting commits
skills.sh is where he runs the numbers: sort the most-downloaded skills, open the file, ask what it actually constrains. "You have to see, what is this?" [00:31:07]

"A lot of websites are still pre-AI."

His favorite move: let the agent run a painful browser task once while it introspects the site's internal API, then save the routes, requests, and cookies as a skill. "It's my absolute favorite thing in the whole world." 00:43:54
Hamel describing a browser extension that inspects routes and network traffic to build a reusable skill
The skill gets the dev console and listens to all the network traffic, so later runs hit the internal API instead of clicking through the UI. "The skill has access to the dev console." [00:43:26]

FROM CAUTIONARY TO FAVORITE

the skills he opened on stream, from the bundle he regrets to the one he loves
his, cautionary

evals-skills

His published bundle: eval audit, RAG evaluation, error analysis, LLM-judge calibration. The object lesson for the whole talk. Read it, then make it yours, because it's a starting point, not the finish line.

replacement

course chatbot + MCP

Searches his evals course sources directly and answers nuanced questions a markdown file would flatten. The MCP hands an agent the same access without anyone opening the web app.

his favorite

Maven browser skills

Run a Maven task once while the agent learns the site's internal routes, then fill a lightning lesson or build a course page from saved API calls instead of thousands of clicks.

prompt-only

frontend-design

Anthropic's popular front-end skill. He used it, then stopped: "I just got tired of my websites looking the same." Age, commit history, and same-looking output all belong in the trust call.

docs map

GitHub Actions skill

Three commits, and when he opened the file it was basically a sitemap for the GitHub Actions docs. "That's all this skill is." If you already use the gh CLI, you may not need it.