She Already Built It

I set out to add federated learning to my research agent. She'd already designed and built it in February. This is what compounding autonomous research actually looks like.

Jun 05, 2026

She Already Built It

I set out to add federated learning to my research agent. She'd already designed and built it in February. This is what compounding autonomous research actually looks like.

A few weeks ago I talked to a federated-analytics company. Good pitch, smart people, a real category. I left the conversation able to repeat the talking points and unable to feel the shape of the thing, which for me is the same as not understanding it. I learn by building. So the plan wrote itself: bolt a simulated federated-learning experiment onto ARIA, my autonomous research agent, use the retinal-imaging work as the worked example, run it on a dataset I already had on hand.

I opened Claude Code and described what I wanted to build. Before it wrote a line, it went through what was already sitting in ARIA's corpus, the way you'd check the fridge before a grocery run.

And it stopped me.

She'd already built it. Three months ago. Without being asked.

Who ARIA is

For anyone who hasn't met her: ARIA is an autonomous research agent I built. She runs continuously. She generates her own hypotheses, scores them against novelty and tractability and impact, designs experiments to test the ones that survive, runs them, critiques her own results, and when a run breaks she diagnoses the failure and repairs it. No human in the loop on the inside of that cycle. I set the direction and the guardrails. She does the research.

One instance of her spent fifty days on retinal-imaging research with a multi-site health-imaging partner. Thousands of commits, a real result at the end, the whole thing documented in a paper I've been writing up.

The paper is a photograph of a system running. What it can't show is what's left in the corpus after the shutter clicks.

This is the first time I'm showing what she does between the headlines. The fifty-day sprint is the part that makes a good chart. It isn't the interesting part. The interesting part is everything still sitting in the corpus that nobody has gone back to look at.

What I found

I went looking for a blank slate to build my federated-learning toy on. Here's what was actually there, in order.

A pool entry dated February 7. Generated by her own idea-generation action in an overnight session, not steered by me, not seeded from a prompt I'd forgotten writing. The title she gave it: a federated-learning framework for multi-site deployment. She scored it 9.2 out of 10 and flagged it promotion-ready. Twenty-five minutes later, in the next session, she refined it.

It wasn't a stub. It carried differential-privacy budgets, personalization layers so each site keeps a local head while sharing the backbone, 8-bit compression for clinics on thin bandwidth, asynchronous updates tolerant of the kind of connectivity you get outside a data center, and a three-phase rollout that went from a ten-site simulation to a fifty-site deployment. She had reasoned about population-specific model adaptation and the health-data regulations of the deployment region. I want to be clear that I did not write any of that. She did, on a Saturday night in February, while I was asleep.

Two weeks later, February 20, an autonomous build session committed a 573-line implementation at 11:43pm. Real federated averaging. Non-IID client simulation, because real clinics don't have identically distributed patients. The canonical 2017 paper cited in her own docstring. Then she kept iterating it across later sessions, surveying, running, debugging, fixing her own template bugs.

I was about to spend a weekend building exactly this. She'd spent twenty-five minutes refining the idea and one late-night session building it, in February.

That's the gap. Three months between when the work was done and when I went looking for it, completely unaware it existed.

Why this is the part the paper can't show

The paper documents operation. The commits, the self-healing, the cross-site result. It records fifty real days of work. But a paper that ends at its operational window has a structural blind spot, and the blind spot is the whole reason to build something like this.

It can't show compounding.

Compounding is when the output of the work becomes the input to the next work. My federated proof-of-concept does not start at zero. It starts at her February design, plus the 573 lines, plus the entire surrounding corpus: the pool of scored ideas, the critiques she wrote of her own results, the branches she tried and abandoned, the templates she built and reused. The value was never the one artifact I tripped over. The value is that the artifact was waiting, and so are a hundred others I haven't gone looking for yet.

A tool is something you use. An asset is something that appreciates while you're not watching.

For eighteen months the builder's pattern I've written about here has been: see the gap, write the first code yourself, prove it works, hand it off. Every step on that list assumes a human starts the clock. What I found in the corpus is a version where the clock was already running before I noticed there was a clock.

What I actually did next

I did not throw out her work and build mine to prove a point.

I read hers, and it reframed the project. The design I'd have built over a weekend was the obvious one, the textbook FedAvg loop. Hers already had the privacy and heterogeneity pieces I'd have bolted on during a second pass, if I'd gotten to a second pass. So the job changed. It stopped being "build federated learning" and became "assemble what's already here, then push past where she stopped."

That's a smaller, sharper task. It also drops the pretense that I started this.

I'm not the first mover on my own project anymore. I'm the second.

And it sent me back to that federated-analytics company with better questions. Not as a buyer nodding through a pitch. As someone who'd already seen the shape of the problem from the inside, because the thing I built had drawn me a map I didn't know I owned.

I build things

If you've read me before, you know the line. I build things. It's the one piece of identity I've never been willing to trade for a title.

Here's the turn. I build things, and one of the things I built now builds things, and it's running ahead of me.

That isn't a loss of control and it isn't a party trick. It's the entire point. The reason to build an autonomous research agent was never to watch it grind for fifty days and write a paper about the grind. The reason is to go looking for something on an ordinary afternoon and find the groundwork already laid, dated three months back, scored, built, waiting.

I have the whole corpus to build on now. I've barely looked.

Sources and prior work

Inside ARIA: Teaching a Machine to Think Like a Scientist: where ARIA first showed up here, building the ideation engine. This post is its sequel.
The long-form paper on her fifty-day run (forthcoming, stay tuned).
I'm Justin Johnson, I Build Things: the builder-identity post this one calls back to.
Compound Velocity: The 20-Hour AI Research Lab: the compounding thesis, earlier and from a different angle.
McMahan et al., 2017, Communication-Efficient Learning of Deep Networks from Decentralized Data. The FedAvg paper ARIA cited in her own docstring.

Run Data Run

Discussion about this post

Ready for more?