The Future of Scientific Computing

Part 3 of 3: Nine Days of Petaflop Delegation

Justin Johnson

Nov 06, 2025

Part 1 showed you how delegation works. Human strategy. AI implementation. Powerful hardware.

Part 2 showed you what becomes possible. Four experiments. Nine days. Parallel development. Digital lab notebook.

Part 3 explores why this matters.

The Full Picture

Nine days of intense collaboration:

383 messages exchanged
58 documented sessions
About 20 hours hands-on work
7 production systems built
Zero all-nighters
Complete documentation

Two cutting-edge tools:

NVIDIA DGX Spark (desktop petaflop, Blackwell GB10, ARM64)
Claude Code (AI pair programming with unlimited context)

The question: What happens when you give researchers these capabilities?

The answer: Research velocity transforms. Not just faster execution. Different ways of working. New questions become feasible. Innovation accelerates.

The Two Tools: Why Both Matter

This isn’t just about AI assistance. And it’s not just about powerful hardware.

It’s about the combination.

DGX Spark: Petaflop Power at Your Desk

Desktop petaflop isn’t hyperbole. NVIDIA Blackwell GB10 GPU. 80GB HBM3e memory. 900 GB/s NVLink bandwidth. ARM64 Grace CPU with 72 cores. This isn’t last generation hardware. It’s the newest architecture, barely six months old.

What it enables:

Unlimited experimentation. No cloud API costs. No rate limits. Want to run 100 hyperparameter variations? Go ahead. Cost per experiment after hardware: $0. That changes how you think about risk.

Overnight compute. Set up training before bed. Sleep for eight hours. Wake up to results. Human time (strategic decisions) completely decoupled from compute time (training runs). You’re productive 24/7 without working 24/7.

Data privacy. Medical data. Proprietary research. Everything stays on-premises. Never leaves your network. For healthcare and regulated industries, this isn’t convenience. It’s compliance.

Learning by doing. Want to understand transformer architecture? Train one from scratch. Novel idea about fine-tuning? Test it today. Made a mistake? Rerun at zero cost. Education through experimentation becomes frictionless.

The Blessing and Curse of Cutting-Edge Hardware

Blackwell + ARM64 is bleeding edge. Very few people have this combination. That’s both exciting and challenging.

The blessing: Raw performance. Latest features. Future-proof investment. Access to capabilities that didn’t exist six months ago.

The curse: Immature tooling. Limited support. Edge cases everywhere.

Real example: GPU inference stability

Training worked flawlessly from day one. Experiments ran overnight with zero crashes. But inference? Random failures. Models would load, run for a while, then crash unexpectedly.

The issue: ARM64 + Blackwell + PyTorch 2.6 is a very new combination. We initially had CPU-only PyTorch installed. Once we discovered this, NVIDIA actually reached out and helped. They pointed us to their official NGC containers with the latest PyTorch builds optimized for ARM64 + CUDA.

Switching to the NVIDIA-supported container solved everything. GPU inference became rock solid. This is the value of early adoption: you get direct support from the vendors who are as invested in making it work as you are.

Framework landscape:

Some tools don’t support ARM64 yet. vLLM has no ARM64 builds. TensorRT-LLM requires complex ARM64 setup. Some libraries are x86-only. Stack Overflow has few ARM64 + Blackwell posts. GitHub issues? You’re often the first to hit the bug.

Why this is actually fun:

Early adopter territory. You discover things. You contribute solutions back. You’re at the frontier. And constraints force creativity. When the easy path doesn’t work, you find novel solutions. Like discovering that well-designed heuristics can match ML accuracy at 95,000x the speed.

Kudos to NVIDIA: They’re actively supporting early adopters. When we hit issues, they responded with solutions. The NGC container ecosystem is mature and well-maintained. That support makes bleeding-edge adoption viable.

Claude Code: Expertise On-Demand

Not a code completion tool. Not autocomplete for function names. A thought partner that discusses approaches, proposes architectures, debugs systematically, and maintains context across weeks.

What makes Claude exceptional:

Deep ML expertise. Understands MLOps best practices. Knows AI engineering patterns. Can discuss trade-offs between architectures. Recognizes when you’re headed for a problem. This isn’t just code generation. It’s consultation.

Unlimited context through documentation. 58 sessions documented. Every decision preserved. Cross-references past solutions. “We solved this on Day 2” with exact details. Never forgets why you chose one approach over another.

Everything can be done with code. Infrastructure setup. Model training. Documentation generation. Monitoring dashboards. All programmatic. All automatable. Claude handles the implementation while you focus on what to build.

Rapid iteration pipeline. Approach doesn’t work? Pivot immediately. Claude regenerates code in minutes. New idea? Test before lunch. No sunk cost fallacy. Experimentation becomes cheap.

Learning amplification. You bring domain expertise. Claude brings implementation speed. Both get better over time. By day nine, our communication was much more efficient than day one. Compound effect.

[IMAGE: Flow diagram showing the collaboration cycle: Human provides strategy/domain knowledge → Claude implements and documents → DGX executes and trains → Results inform next iteration → cycle repeats, with arrows showing bidirectional learning/feedback between all three components, clean technical illustration style]

Patterns from 383 Messages

Message breakdown:

Planning & Strategy: 96 messages (25%)
Implementation: 153 messages (40%)
Debugging: 77 messages (20%)
Documentation: 57 messages (15%)

Key insight: AI doesn’t do 100% of any category. Always back-and-forth. Always collaborative.

What Surprised Me

I think bigger. Expected AI would make me more productive at current tasks. Reality: I consider larger problems. Take on more ambitious projects. Think at higher abstraction levels.

Why: Cognitive load reduced. Not worried about syntax. Not debugging boilerplate. Brain freed for strategy. Day nine, I’m casually prototyping Socratic training (a novel teaching method). Traditional timeline: two-month research project. Our timeline: 2.5 hours to validated pipeline. Failed miserably, but that’s the point. Try fast, learn fast.

The shift: From “Can I build this?” to “What should I explore next?”

The DGX feels like a superpower. Traditional ML: Submit job to cluster. Wait in queue. Hope it doesn’t crash. Check back tomorrow. With DGX: Run experiment now. See results in real-time. Iterate immediately. No waiting.

The feeling: Personal supercomputer. Instant feedback. Unlimited experimentation. Addictive.

Medical AI experiments 1-7 ran overnight while I slept. Wake up. Analyze over coffee. Kick off next experiment before work. Human rhythm. Machine rhythm. Different timescales. Both productive.

Failures become cheap. Traditional: Bad approach costs two weeks. Sunk cost fallacy. Keep pushing. With DGX + Claude: Bad approach costs two hours. Pivot immediately. Try something else.

Why this matters: More exploration. More creativity. Less attachment to failing approaches. Better science through rapid hypothesis testing.

Documentation is better. Traditional: Write code first, document later (if ever). With Claude: Documentation and code co-evolve. Session files captured concurrently. Blog posts practically write themselves. Context never lost.

Result: 58 comprehensive session files. Complete audit trail. Reproducible experiments. Publication-ready methodology. Zero extra time investment.

What This Means for Scientific Computing

For Individual Researchers

Before: Infrastructure setup takes weeks of IT tickets. Experiments are manual and error-prone. Documentation is an afterthought. Reproducibility is hard to achieve.

After: Infrastructure in days with AI assistance. Rapid experiment iteration with complete documentation automatic. Reproducibility built-in because everything is logged.

The shift: From spending 70% of time on infrastructure and 30% on research to spending 20% on infrastructure and 80% on research. Time allocation fundamentally changes.

Real impact: That side project you’ve been thinking about for months? Try it this weekend. 20 hours to validated prototype. After-hours innovation becomes feasible without sacrificing work-life balance.

For Research Groups

Knowledge capture becomes automatic. Every decision documented with rationale. New team members onboard faster. Institutional knowledge preserved. Shared AI context across team members. Consistent documentation standards.

Collaboration overhead decreases. Digital lab notebooks shareable instantly. Complete experiment history available. Cross-project learning enabled. Communication becomes more efficient.

Velocity compounds. Each project builds better infrastructure. Each experiment improves tooling. Learning accumulates across the team. What took weeks in month one takes days in month three.

For the Field

Democratization of ML infrastructure. Lower technical barriers. Smaller teams competitive with larger organizations. More diverse participation. Geography matters less.

Quality baseline rises. Better documentation standard. Statistical rigor easier to maintain. Reproducibility higher by default. Code quality improves.

Innovation accelerates. More experiments per researcher. Faster hypothesis testing. Cheaper exploration of novel ideas. Faster time to publication.

Economics shift. Hardware is one-time cost. Compute scales locally. No per-API costs. Infrastructure becomes reusable asset. ROI measured in years, not months.

What Comes Next

Week Two: OncoForge

Week one proved the infrastructure. Week two tests what it enables.

OncoForge is a build-in-public experiment running on x.com/OncoForge. The goal: train a real oncology multimodal small foundation model. Not a toy example. Not a demo. A production-capable vision-language model for cancer research and clinical applications.

What makes this ambitious: Medical multimodal models are hard. They need to understand both images (histopathology slides, radiological scans, molecular visualizations) and text (clinical notes, research papers, treatment protocols). They need domain expertise (oncology knowledge). They need to be accurate (medical applications demand high standards). And they need to be small enough to deploy (not every hospital has massive GPU clusters).

Why build in public: Most AI research happens behind closed doors. Papers appear months after completion. OncoForge inverts this. Every decision documented. Every experiment shared. Every failure and success visible in real-time. The goal is to show how modern AI research actually works. The messy parts. The pivots. The breakthroughs.

What the first week’s infrastructure enables: This experiment was impossible without it. Training multimodal models requires massive compute (the DGX provides it locally). Requires rapid iteration (Claude handles implementation velocity). Requires meticulous documentation (the digital lab notebook captures everything). Requires sustainable pace (nights and weekends without burnout).

Current status: Model architecture designed. Training pipeline validated. First baseline models in progress. The foundation from week one (monitoring dashboards, W&B integration, Docker infrastructure, session documentation) means we’re starting from a position of strength, not from scratch.

Why this matters: If we can train a specialized medical AI model in public, on local hardware, with AI assistance, while maintaining quality and documentation standards, it proves the approach works for real research. Not just infrastructure experiments. Not just system tests. Actual novel AI research with clinical potential.

The hypothesis: Small, specialized foundation models trained with high-quality domain data can match or exceed larger general models on specific tasks. Oncology is the test case. Success here opens pathways for other specialized medical domains.

This is what compound velocity looks like. Week one built the platform. Week two builds something that matters.

The Trajectory

Week one proved the system works. Infrastructure solid. Workflows efficient. Documentation comprehensive. Quality maintained at velocity.

Week two moves to harder problems. Novel architectures. More complex experiments. Building on the foundation we established. Each week enables next week’s research.

This is the compounding effect of good infrastructure. Velocity doesn’t plateau. It accelerates.

The Honest Take

This isn’t AI replacing scientists. It’s not “push button, get research.”

What you still need:

Domain expertise to make good decisions. Strategic thinking to choose what to build. Quality judgment to know when to ship and when to iterate. Creativity to try novel approaches. Ethical oversight, especially for medical applications.

What AI amplifies:

Implementation speed. Context maintenance. Documentation quality. Iteration velocity. Learning rate.

The balance: Human provides wisdom. AI provides velocity. Hardware provides power. Together: wisdom at velocity at scale.

Not everyone will work this way. Some researchers prefer full control over every detail. Some teams have regulatory requirements that limit AI assistance. Some domains need purely novel approaches where AI’s pattern matching provides less value.

That’s fine. This is one approach. One that worked remarkably well for this project. Your mileage will vary.

The Real Achievement

Nine days. 20 hours of hands-on work. Multiple production systems. Statistical validation. Complete documentation. Zero burnout.

But the real achievement isn’t the speed. It’s sustainability.

This pace isn’t a sprint. It’s a new steady state. After-hours innovation without sacrificing sleep or sanity. Day job maintained. Quality never compromised. Technical debt stayed at zero.

The insight: Sustainable research velocity, with quality maintained, is achievable right now. Today. With tools that exist.

Not someday. Not after the next breakthrough. Not when the hardware gets cheaper or the AI gets smarter.

Now.

What this unlocks: More researchers can build sophisticated ML systems. More small teams can compete with well-funded labs. More ideas can be tested quickly. More innovation happens faster.

This is the democratization of AI research we’ve been talking about. Not through simplification or abstraction. Through intelligent augmentation. Through giving researchers superpowers without taking away their expertise.

The future of scientific computing: Not humans replaced by AI. Humans augmented by AI. Researchers with powerful local compute. Scientists who think strategically while AI handles implementation. Wisdom at velocity.

That future started nine days ago on my desk.

Where it goes next is limited only by what we choose to explore.

This concludes the three-part series on building an AI research lab in nine days. Part 1 explained delegation vs automation. Part 2 showed the experiments and digital lab notebook. Part 3 explored the collaboration patterns and future implications.

For infrastructure details, see Three Days, One Petaflop, and an AI. For RAG deployment, see From Petaflop to Production.

Run Data Run

Discussion about this post