The Atomic Unit of Work Just Changed

The model commoditized. The frontier moved to the unit of work running on top of it, and that unit can now reason.

May 27, 2026

Last week I argued that the model is no longer the frontier. The first academic conference on agentic systems convened, and almost none of the papers touched the model itself. The hard part had moved to everything that makes a model useful, trustworthy, and safe to run.

That post ended on a question it didn’t answer. If the frontier moved off the model, where exactly did it land, and what is the thing we’re now building on top? An agent is too big to be the answer. A prompt is too small. The useful unit sits in between, and naming it is what every team trying to run real work on this eventually has to do.

I think the unit is a skill. Not in the soft “upskilling” sense. A skill in the precise sense: a named, versioned, governed piece of procedural knowledge that a frontier model executes with judgment.

The field already has a word for the layer below it. A tool is the atomic unit of computation: a function, an API call, deterministic, it executes and returns the same way every time. That layer didn’t change. What changed is that a unit appeared above it, the atomic unit of process, and unlike a function it reasons. That unit is the skill, and for the first time it’s a thing you can hold, count, and hand to a model.

A tool is the atomic unit of computation. A skill is the atomic unit of process, and unlike a function, it reasons.

What a unit looks like in practice

I run about 70 of them. Across four machines I have 70 Claude Code skills, 23 subagents, 21 slash commands, and 8 autonomous agents that operate without me in the loop. That sounds like a lot of plumbing. It’s mostly not plumbing. Most of it is procedure.

One reads a company’s website and tells me where the substance ends and the slideware begins, then names the three rivals worth weighing it against. Another throws a hard question at four rival frontier models at once and writes back the answer none of them gave on its own. A third one read this essay before you did. It pulled every factual claim into a list, played hostile reviewer against each one, and handed back a verdict on which to keep, soften, or cut. The soft ones got fixed before they reached you.

Strip one open and there is less than you would expect. A skill is a folder with a Markdown file in it. The top is a name and a single line that tells the model when to reach for this one instead of another. Everything below is the job in plain English: what good looks like, the traps, the one rule it must never break. The red-team skill is about a page of that. Roughly:

name: red-team
description: Run before publishing any fact-making draft. Pull every
  claim into a list, attack each as a hostile reviewer, return a
  keep / soften / cut verdict per claim.
---
You are a skeptical senior reviewer. For each claim, trace it to its
source, then decide whether it survives a hostile read. Name the exact
sentence that is weak, and why...

No special language, no framework, no engineer required to change it. That is close to the entire artifact, and it is the part that surprises people: the procedure is now a document you can read, edit, and argue with, the same way you would mark up a policy memo. A skill is a Word doc that happens to run.

That third skill is the interesting kind, because I could never have written its rules. I didn’t enumerate “if a claim names a competitor, soften it” or “if a number has no source, flag it.” I wrote down what a sharp adversarial reviewer does, and the model plays that reviewer against whatever I hand it. That is the shift. The unit carries the intent; the model supplies the judgment at runtime.

Why this breaks from how software has always worked

Here is where I want to be careful, because the easy version of this claim is false and any engineer reading will throw the post across the room.

Software has always branched. If-else, state machines, BPMN gateways, rules engines, robotic process automation with decision nodes, a fraud model scoring a transaction. Branching is not new, and “agents can make decisions” is not the breakthrough.

The real difference is narrower and sturdier. Every branch in traditional software had to be enumerated in advance. A person specified the decision logic, case by case, and anything outside the specified cases fell through to an error or escalated to a human. Even a machine-learning classifier, which feels like judgment, decides inside a space someone defined and trained for. When reality served up a case nobody anticipated, the system stopped.

A frontier model at the node changes the shape of that. The decision logic no longer has to be fully written down ahead of time. The model handles inputs the author never specified, by reasoning about them in context. The branch space is open instead of closed. The set of situations the system can handle gracefully is no longer the finite list you remembered to write.

Every branch in old software was written down in advance. A frontier model handles the case nobody wrote down.

This is not “the model does whatever it wants.” The harness around it does real work: rules constrain what’s allowed, hooks fire deterministically at fixed points, the skill itself carries guardrails. The structure is rigid where it needs to be. The judgment is open where rigidity used to force a halt. That combination, a deterministic harness wrapped around an open-branch reasoner, is different from the software we’ve shipped for forty years, and it’s why the old “automate the workflow” instinct undersells what’s happening.

The field landed on the same unit, all at once

I’d been living with this and treating it as a personal quirk. Then, inside a three-week window in May, three separate research groups published work that all circled the same object from different sides.

SkillOpt, out of Microsoft, treats a skill document like a set of weights you can train: run it, reflect on what broke, make bounded edits, gate the change behind a validation step. Optimize the unit you have.

SkillsVote governs a whole library of units across its life. Collect them, recommend the right one, attribute an outcome to the specific skill that produced it, and evolve only the ones that actually helped. Govern the library.

SkillOS splits the agent in two: a trainable curator that manages the skills, and a frozen executor that runs them. Make curation a first-class component of the system, not an afterthought.

Three labs, three angles, one conclusion: the next gains live in the skill layer, not in the weights. The simultaneity is the signal. When independent teams name the same problem in the same month, it stops being a hunch and becomes the shape of the field.

The week those papers landed, I hit the identical wall by hand. My corrections had been piling into skill files with no discipline, every fix making the library a little harder to reason about. So I built two things: an evolution loop that traces a failure to its actual cause before editing anything, and a curator that proposes which skills to merge or retire and never executes on its own. No reinforcement learning, no training loop. The same instinct the papers formalized, reached from the practitioner side because the pain forced it.

Verbs, procedures, and the org as a graph

A skill is a verb. The next move is to compose verbs into procedures, and that is arriving now. The workflow capability landing in Claude Code (community-documented around v2.1.147, not a formally announced product, so hold it loosely) wires skills into deterministic control flow: phases, conditionals, parallel steps, loops, with model reasoning at each node instead of a hardcoded branch. One builder called it turning standard operating procedures into executable graphs. The right instinct, even this early.

The picture is bigger than my homelab. Daniel Miessler argued in 2024 that a company is just a graph of algorithms: every business process decomposes into nested procedures, all the way down. For most of corporate history those procedures lived in two forms that never matched. Rigid software that couldn’t deviate, or a written SOP that described what people should do while they quietly did something else.

A company is a graph of algorithms. Now every node thinks.

The unit I’ve been describing collapses that gap. A procedure encoded as composable skills is both the description and the execution. The SOP stops being a document nobody opens and becomes the thing that runs. The process-management world sees it coming: the field is reframing from predictable, predefined paths toward event-driven, AI-orchestrated systems, and incumbent banks are reengineering processes designed for a pre-AI era. This is not a startup story. It’s the operating model changing underneath established companies.

The catch is structural. A workflow is only as trustworthy as the skills it composes. Stack reasoning units into a procedure and every weakness in one unit propagates. The moment you compose, the governance of the underlying library stops being hygiene and becomes what the whole procedure rests on.

The job this creates for a leader

So the managerial question is not “what can our AI do.” Capability is the part that commoditizes, the same way model access did. The durable question is whether you own the library of how your organization actually works, as composable units you can govern.

That ownership has a shape. It needs a curator, someone accountable for the library the way a maintainer is accountable for a codebase. It needs the retire verb to be as cheap as the add verb, because a library where you can only add is a library you’ll be unable to reason about in six months. And it needs an attribution loop, a way to tie an outcome back to the unit that caused it, so you evolve what helped and cut what didn’t.

A library where you can only add is one you can’t reason about in six months.

The sharpest framing of the stakes I saw this month came from a builder describing the new scarcity: the constraint is no longer labor efficiency, it’s output per unit of attention. The scarce input in this era isn’t compute or headcount. It’s the density of human judgment encoded into the system. A curated library is how you bank that judgment so it compounds instead of evaporating when someone leaves.

The honest version

I’m not going to pretend the path is clean, and I’ve written the unglamorous parts elsewhere so I won’t relitigate them here. The returns aren’t showing up at scale yet for most deployments, the most common failure is coordination and handoff rather than raw capability, and encoding a process tends to expose the politics that the process’s ambiguity was quietly protecting. If you want the failure modes in detail, I laid out why most agent projects miss in the agent archaeology checklist, and the accountability patterns that hold up in regulated work in AI as exoskeleton, not coworker. The honest read is that the unit exists and the discipline around it is not yet mature. Both things are true at once.

Where this leaves you

If you run an AI program, change what you measure. Stop tracking the ceiling of what your tools can do and start tracking what fraction of your skill and workflow library you can still explain, and whether retiring a capability is as cheap as adding one. If only the add verb is cheap, you are accumulating drift, not advantage.

The model got cheap. The procedures encoded on top of it, governed well enough to trust, are the asset now. That is the frontier the conference papers were circling, and it’s the one your organization actually has to build.

Run Data Run

Discussion about this post

Ready for more?