Borrowed Iron

A borrowed eight-GPU node, a global-health mission, and a crew of six: me, a founder, and four interns.

Jun 28, 2026

Earlier this week, NVIDIA gave a global-health startup I work with a node of eight H100 GPUs to use for a couple of months. The grant came through NVIDIA Inception, their program for startups. No invoice. They looked at the mission, believed in it, and handed over the iron.

The day the box came online, I posted the first entry of a build log about it on our technical blog. In the few days since, the pace has been fast enough that the story already needs an update. So here it is properly, from the top, for the people who lead the work rather than write the code.

Here is the part I love most, before anything else. The crew putting that machine to work is six people. Me. Nick, who runs the company. And four interns, the last of them starting tomorrow. Three are in college; the fourth just finished high school and starts undergrad soon.

A node of eight H100s, a foundation model to train, and a two-month clock. The crew is two adults and four interns.

The mission

SocialEyes is bigger than the AI, and that is what pulled me in. The mission is to converge a wide range of AI healthcare skills onto the patient, right at the point of care. Today's workflow is screen, then refer: a frontline worker spots something, and the patient travels to a specialist who may be hundreds of miles and many months away. That model dates to the 1890s, and it simply does not work in low- and middle-income countries. There are roughly 200,000 eye specialists on Earth and well over a billion people with diabetes, high blood pressure, and the slow diseases that take sight and life. The math does not work. SocialEyes is trying to fix the math.

The way in is the eye. It is the cheapest window we have into the rest of the body, the one place you can photograph blood vessels and nerve tissue directly, with no needle, using a camera that fits in a clinic. We lead with timely screening of the chronic diseases that increasingly dominate global health: diabetes, hypertension, and the conditions that take sight before anyone catches them. From the same image we can also pull biomarkers and risk factors, and in some cases a specialist-level read of certain retinal conditions. The retina is just one of the tissues the SocialEyes architecture can handle. By itself, signs of dozens of diseases, in the eye and systemically, can ultimately be detected.

That is what makes it worth two borrowed months. The work is grounded where it is needed most: low-resource clinics across low- and middle-income countries, anchored in Nepal. The first deployments are at clinics and other fixed sites, not yet in the hands of community health workers in the field. That comes later, with a dedicated device called MARVIN. For now: a camera, a trained model, and a clinic a long way from the nearest specialist. That is the product. The retinal AI is how it gets there.

The training data is grounded in public research datasets, which is a deliberate choice. It keeps the method something we can talk about openly, even when specific results stay in the lab. Which is convenient, because that is exactly what I am doing.

How I ended up in the room

SocialEyes is Nick's mission. I build the AI for it, and have been part of the team since the start of the year.

They found me through the open work. The experiments on a DGX Spark on my desk, the writing on the blog, the habit of building the real thing and showing it. Nick reached out, I started helping, I dug the mission, and now we are doing some of the coolest work I have done.

That origin is the whole way I think about building. Build the real thing, share it in public, and the right people show up. A working system and an honest write-up change the conversation from "should we" to "how do we do this faster." This collaboration is what that looks like when it happens.

The four interns are the same story from the other side. Students who wanted to do real work, not a slideshow about AI. So they are doing real work. They have stood up their first shared code repository, they watch the GPU dashboards, they own a lane of the operations. Three are in college and one is about to start. Most people their age get coffee. These four are helping run a supercomputer.

The borrowed iron

Here is the through-line for anyone who has followed the work. The small box came first.

A DGX Spark, NVIDIA's desktop AI machine, sits on my desk with 128GB of memory. It is a real research rig at a tiny fraction of data-center cost. Most of the hard questions, which tools to use, which model, how to make everything behave, got beaten out on that small box first, where a mistake costs minutes instead of metered GPU-hours. The little box is the cheap rig that de-risks the expensive one.

The borrowed node of eight H100s, running in the cloud, gets to stand on all of that. Eight H100s, about as much compute as you can put in a single machine, is enough hardware to train a foundation model from scratch and run dozens of experiments beside it. That is the thing the small box could never do. The lineage is the point: de-risk on the desktop, then deploy at scale in the cloud. And the clock is running. Two months, then the iron goes back.

The DGX Spark on the desk taught us the method for a year. The borrowed node is where we finally get to run it.

The pace

What follows is the short version of a long few days. We brought the node up from a web console to a first running job, stood up a private reasoning model on the box so the team's tools ran on our own hardware, and split its storage into permanent and scratch on a machine that erases itself if you misfile a single thing. We gave the whole team secure access through one entry point, nobody holding the keys to wipe it, and staged terabytes of retinal imagery across four kinds of eye scan, CFP, IR, FAF, and OCTA, without losing a day to a careless mistake. Before spending a GPU-hour, we read the literature and locked the recipe. That alone caught a starting setting roughly sixteen times too aggressive, before it burned a wasted run. Then we trained. The first foundation-model run came back with an uncomfortable verdict: color photos alone topped out no better than an off-the-shelf vision model. So we did not argue with the data. We pivoted to a richer, multimodal approach on a second kind of scan that had been sitting unused on the box, and launched a fresh run inside the same week.

Around the training, we built an evaluation suite to score every model on the tasks that count, ran ablation sweeps overnight with the box chaining its own experiments while we slept, and red-teamed a result that looked too good, catching three stacked bugs before any of them shipped as a finding. I put ARIA, the autonomous research engine I built, on the node and watched it run its first experiment on its own, then re-contracted it for a world where compute is suddenly cheap and gave it a cheat-detector and a wall of baselines so nothing scores well by accident. All of it ran across three AI coding agents in three lanes at once, operations, modeling, and the research engine, kept from colliding by a shared board with clear ownership. We even survived a disaster: a sync tool deleted the shared workspace, and we had it all back the same afternoon from versioned history and a nightly backup, then locked the setting so it cannot repeat. And the whole time, four interns stood up their first shared code repository and ran a real lane of the operations, the GPU dashboards and a slice of the ops.

None of that pace is about working longer hours. It is the harness. The stack of AI tools I have spent two years building and writing about: the coding agents, the autonomous research engine, the skills that turn a one-line instruction into a finished job. The harness does the grunt work. The five of us do the deciding. Point it at a borrowed node and a mission worth the effort, and all of that is a few days, not a few quarters. It also means you can afford to be wrong fast and right next, the way we were when the first model fell flat and the second one took its place inside the same week.

The pace is not longer hours. It is the harness doing the grunt work while the five of us do the deciding.

AIXplore, the lab

I have mentioned the other blog in passing before but never properly introduced it. AIXplore is the lab. Where Run Data Run tells the story, AIXplore shows the work: the engineering, in detail, for the people who do sit in the code. How you give a team access to a borrowed machine without handing out the keys to wipe it. How you stage terabytes of scans without losing a day. How you serve a model locally when the documentation lags the code by a month. Same work as here, one altitude down.

It got a full rewrite this week, too. While ARIA ran SocialEyes experiments on the node in the background, I rebuilt the entire platform. Every new post carries interactive widgets you can poke at, side quests that branch off into the deep cuts without derailing the main read, and, best of all, a reproduction prompt on every piece. Copy it, point your own Claude Code at it, and build the thing yourself. Whether you have shipped models for years or you are trying to start this weekend, there is a door in.

The whole SocialEyes build is going up there as it happens, in a series called Borrowed Iron: standing up the node, the access plumbing, the data staging, the autonomous engine, the training runs. If this post made you want the real detail, that is where it lives. And almost none of it is specific to retinas. The lessons transfer to anyone doing serious work on rented GPUs.

Come along

I am on a break between jobs this summer, a real sabbatical, and this is what I am doing with it. I will get to a beach at some point. I will also publish a book before the summer is out. And the rest of it goes to a borrowed supercomputer, a founder, and four interns, building healthcare for people a long way from a hospital. This is how a sabbatical should go. I figure I will sleep once I start the new job in August.

So that is the setup. A mission I believe in. A partner who backed it with serious hardware. An unlikely little crew, two of us and four interns, moving faster than the crew size has any right to. And a standing invitation to watch us make the most of two borrowed months in public.

New parts land as the work happens, here and on AIXplore both. Come along.

Justin writes Run Data Run for the people who lead the work, and AIXplore (ai.rundatarun.io) for the people who build it. The Borrowed Iron series runs on AIXplore as the SocialEyes build happens. If this was useful, the easiest way to support it is to subscribe and forward it to one person who would want it.

Run Data Run

Discussion about this post

Ready for more?