You Don't Have to Write the Code

Anthropic watched 400,000 sessions with a coding agent and found that what predicts success isn't your job or your syntax. It's whether you understand the work. That changes who gets to build, and what your expertise is worth.

Jun 17, 2026

This week Anthropic published an analysis of about 400,000 real Claude Code sessions, run between last October and this April. It set out to answer a plain question: when someone sits down with a coding agent, what actually predicts whether they succeed? There is a comfortable answer and a more useful one, and almost everyone is repeating the comfortable one.

The comfortable answer, the one the headlines took, is that anyone can build software now. That part is true, and it is the least interesting thing in the report.

The finding underneath is the one I'd hand to anyone, whether they run a team or just their own work. What predicted success was not your job title, and it was not whether you could write code. It was whether you understood the problem you were trying to solve. Anthropic's own framing: success comes from how well a person understands the work, not whether they're trained in coding.

That is not a story about AI flattening the gap between your best people and your average ones. It is the opposite. The thing you spent fifteen years getting good at just became the thing that decides whether the most expensive tool you're buying actually pays off. I have been making that case for a while. This is the first time anyone has put 400,000 sessions behind it.

What the tool actually divided up

Start with the cleanest number, because it doubles as a mental model you can carry into a Monday meeting.

In a typical session, the human made about 70 percent of the planning decisions and the agent made about 80 percent of the execution decisions. People decided what to build. The agent decided how to build it. That split held across every kind of work they measured, from writing code to running systems to analyzing data.

So the labor didn't disappear. It separated. The agent took the part a lot of people thought was the moat, the ability to actually produce the syntax, and handed back the part that was always the hard part: knowing what to ask for, and whether the answer is any good.

Here is what makes me trust the number. A practitioner reached the same shape from the other side. Addy Osmani, writing in January, found that the developers succeeding with these tools "spend 70 percent of their time on problem definition and verification strategy, 30 percent on execution." Two independent measurements, one watching a population and one watching working engineers, landing on the same line through the work. The tool made the typing cheap. It did nothing for the knowing.

The tool made the typing cheap. It did nothing for the knowing.

This is the thing I've been calling a change in the atomic unit of work. The unit a person owns moved up the stack, from producing the thing to directing it. Now there's telemetry under the claim.

Your job title barely moved the needle

This is the result Anthropic led with, and it earns the lead.

They scored a verified success rate, which is stricter than it sounds: a session counts as a win only if the model judged it successful and there was a hard signal to back that up, a real commit, a passing test, a user saying yes. By that bar, on code-producing work, software occupations succeeded 34 percent of the time and everyone else succeeded 29 percent. A marketer and a staff engineer finished a coding task at almost the same rate.

The instinct is to read that as "engineers are finished." It is the wrong read, and the same study hands you the right one. What collapsed was the premium on being a programmer. What held was the premium on understanding the problem. Measure expertise correctly and it mattered enormously: novice sessions succeeded 15 percent of the time, and people who actually knew their domain landed between 28 and 33 percent. Roughly double.

The reconciliation is the part that clarifies it, because it is the bet I've been making. Expertise here is not your title or your résumé. It is task-specific. Anthropic's own example: a senior engineer asking their first question about the Rust language is a beginner at Rust. The skill that predicts whether you get working software out of an agent was never "I am an engineer." It is "I understand this particular problem well enough to direct the work and catch it when it goes wrong."

That is the whole argument for the builder-leader, measured now across a population instead of asserted from a single desk. You do not have to become an engineer. You do not have to write the code. You direct it, inside a domain you already command, and the command is the thing that pays.

You don't have to become an engineer. You don't have to write the code. You direct it, inside a domain you already command.

The expert tell is recovery

If I had to keep one number from the whole report, it would be this one.

Experts didn't just prompt better on a good day, though they did do more with each instruction: about 12 agent actions per prompt versus 5 for novices, and roughly five times the output. The real difference showed up when things went sideways. When a novice hit trouble, they walked away with nothing written about 19 percent of the time. For everyone with more domain knowledge, that abandonment rate was 5 to 7 percent.

Sit with what that means. The expert hit the same wall the novice did. Then they routed around it, reframed the problem, and caught the fluent, confident, completely wrong answer before it shipped. The novice hit the wall and quit.

So the value your senior people add in an agentic workflow is not that they prompt cleaner on a Tuesday. It is that they fail better on a Thursday. They have the judgment to know when the plausible output is wrong, and the agent does not have that judgment and cannot get it from a model update. The bottleneck moved from typing to deciding. Pratima Arora at Smartsheet put it plainly this spring: the hours haven't changed, but the density of work has.

The scarcest thing in the building is now the thing your best people already carry. That is good news, and most of the coverage will skip right past it.

The scarcest thing in the building is now the thing your best people already carry.

The work moved up the stack

Two more numbers close the loop, and they are the ones that tell you where this is going.

Between October and April, the share of sessions spent fixing broken code fell from a third to under a fifth, 33 percent down to 19. Over the same stretch, the estimated value of the work people brought rose about 27 percent, with the biggest jump in building something new, up 43 percent.

Put those together and the trajectory is clear. People are not using the agent to fix more bugs. They are using it to attempt harder, more valuable, more end-to-end work, and bringing more judgment to bear when they do. The tool got cheaper per task, so the tasks got more ambitious.

This is the oldest pattern in economics wearing new clothes. Make a thing cheaper to produce and you do not produce less of it. You produce far more, and you need more judgment to steer all of it. That is the shape of this moment, and it cuts against the fear that the work is shrinking. The work isn't shrinking. It changed: what we attempt got bigger, and how we get there moved from doing to directing.

Two things held their price, not one

There is a second thread here, and it is the one that turns a study into a strategy.

Anthropic measured what the human brings to the session, and found that what the human brings, domain command, is decisive. That is one of two things this whole shift never made cheaper. The other is what you build around the model.

The model is the commodity, and the system you build around it is the moat: the rules and skills and memory that turn a smart, forgetful chat box into something that gets better at your specific work over time. Birgitta Böckeler, writing on Martin Fowler's site, named the gap the model itself can never close. A coding agent, she wrote, has "no social accountability, no aesthetic disgust at a 300-line function, no intuition that 'we don't do it that way here,' and no organisational memory." Those are not features you buy a newer version of. They come from the person and the system around the model, or they don't come at all.

So the picture is narrower and more useful than "expertise wins." Two things resisted the repricing. What you bring to the model, which is domain command, and what you build around it, which is the system that holds your organization's judgment. The model in the middle, the part everyone is still shopping for on price, is the cheap layer between them. Anthropic just measured one of those two human pieces at a scale none of us could reach alone. The other one you build.

And both are leadership skills, not engineering ones. Setting intent, designing the handoffs, evaluating output against what good actually looks like. Those are the same instincts that got your best people to senior in the first place, pointed now at a system of agents instead of a team of people. No new species of human required. That is the part I keep coming back to, and it is why I think the people who internalize this will out-build the ones still arguing about whether the juniors are coming for the seniors.

One honest note

At the level of the whole economy, you cannot see this yet in the aggregate numbers. A large Danish study of AI chatbots across occupations found no significant effect on earnings or recorded hours. I take it seriously, and I read it as a statement about timing, not a refutation. Session behavior changes before payroll does. The 400,000 sessions are the leading edge; the wage data lags. Hold the whole argument a little more loosely for it, and don't over-rotate on any single quarter's story.

What I'd do with this Monday

If you run a team, here is the part that converts. If you don't, read it as where to put your own time.

Put the agent in the hands of your domain experts directly, not only your engineers. The advantage is highest exactly where deep knowledge meets the tool. The clinical-operations lead who has watched the trial workflow break in a dozen specific ways will get more out of an agent than a generalist engineer who hasn't, because the agent can produce the code but it cannot supply the judgment about what the code is for.

Hire and promote for understanding the problem and recovering well, not for who writes the cleanest syntax. The data says occupation barely predicts success and recovery strongly does. That is now a measurable thing to select for.

And grow your next builder on purpose. The way across this gap was always building, six months of directing real work inside a real domain, not sitting through a demo. The leaders who win the next few years won't be the ones with the best coders. They'll be the ones who turned their domain experts into builders before anyone told them it was allowed.

The agent decides how. You still decide what, and you still decide who learns to decide what next. Four hundred thousand sessions just told you those are the two jobs worth keeping. They are not a smaller job than the one before. They're the better one.

Related: The Specialist Is Now You is what one expert plus an agent can now do alone, and The Atomic Unit of Work Just Changed is where the unit a person owns moved up the stack.

The fuller argument for treating judgment, intent, and domain command as the durable skills is the Builder Leader field guide (builder-leader.com).

Run Data Run

Discussion about this post

Ready for more?