Three Days, One Petaflop, and an AI Partner: Building a Research Lab

Building a GPU research lab with Claude Code, and documenting both what I built and how we built it

Oct 22, 2025

Last weekend I told you about how Claude Code had become my thought partner. I mentioned we were building something interesting together.

I now have an NVIDIA DGX workstation in my home office. Three days ago, it arrived. Today, it’s running production AI experiments with custom model routing, automated quality benchmarks, and fine-tuning workflows. And I’m documenting the entire process in two completely different ways, for two completely different audiences.

Let me explain what happened, and why this matters beyond my specific setup.

How I Got Here

Two months ago, I decided I needed my own GPU infrastructure. Not cloud credits with usage limits. Not shared resources with queue times. Hardware I control, running models I choose, on experiments I design. So I bought a DGX.

Here’s the thing. I run AI platforms professionally. I understand systems architecture, networking, the broad strokes of infrastructure. But the low-level details of configuring a research-grade GPU server? Docker orchestration with CUDA dependencies? Kernel parameters for optimal inference performance? That’s the difference between understanding something conceptually and implementing it every day.

This is exactly where Claude became essential. Not as a search engine. Not as a code autocomplete tool. As a thought partner who knows Linux administration cold and can help me implement while explaining the tradeoffs.

Day One: Foundation

We started with the basics. Ubuntu server setup, CUDA drivers, networking configuration. But instead of following random Stack Overflow posts, I had a conversation:

“I need two inference engines running the same model so I can compare performance. What’s the cleanest way to isolate them?”

Claude proposed an approach. We discussed the tradeoffs. I made the architectural decisions based on my use case. Claude handled the implementation details I’d have spent hours researching.

By end of day one:

Two inference engines operational (Ollama and llama.cpp)
Both running Gemma for fair comparison
Startup scripts, health checks, monitoring in place
Custom shell environment with 50+ ML-focused aliases

One day. Clean implementation. Fully documented.

Day Two: The Intelligent Gateway

This is where it got interesting. I wanted to route inference requests intelligently between engines based on request characteristics. “Use the fast engine for simple queries, the accurate engine for complex ones.”

Standard approach: train a classifier. Use embeddings. Build an ML routing system.

Claude suggested something different: “What if we use simple heuristics based on query length and complexity markers? Might be worth testing before building something heavier.”

We implemented both. Ran benchmarks. The heuristics matched ML accuracy at 95,000x lower latency. Sometimes the simple solution is actually the right solution, but you need someone to suggest it without ego.

By end of day two:

Intelligent request router deployed
Performance benchmarks completed (Ollama 94 tok/s, llama.cpp 104 tok/s)
Gateway selecting optimal engine per request
Complete API for external applications

Day Three: Real Research

Day three is when this stopped being infrastructure work and became actual research.

I wanted to evaluate model quality across different architectures. Not synthetic benchmarks. Real evaluation on tasks that matter for my work. So we built a quality evaluation framework.

Results were surprising. Gemma 9B: 97% quality score, 0.65s response time. GLM-4.5-Air: 44% quality score, 16.79s response time. Those aren’t small differences. Those are “one model is fundamentally better for this use case” differences.

Then we moved to fine-tuning. Started with PubMedQA (biomedical question-answering dataset). The goal: take a general model and make it genuinely better at scientific reasoning. That’s running now.

Three days from delivery to publishable research results.

The Documentation System

Here’s where this gets interesting, and why I’m writing this post.

As Claude and I worked, I realized we were creating two completely different stories from the same work.

The Technical Story

Every session on the DGX, Claude is writing detailed technical documentation. Not just command history. Structured notes with objectives, implementation details, performance metrics, code samples, challenges and solutions. These sessions live in organized directories on the DGX, automatically categorized.

From these sessions, I’m publishing technical deep dives at AIXplore. Posts like “When Simple Heuristics Beat ML by 95,000x“ or “Supercharge Your Shell with 50+ ML Productivity Aliases“. The “what I built and how it works” content for ML engineers.

The Collaboration Story

But I’m also capturing the complete conversations with Claude. Every prompt. Every decision point. Every surprise. Claude Code logs everything to JSONL files. I built a system (with Claude’s help, naturally) to automatically export and format these conversations as readable markdown.

From these transcripts, I’m launching a second blog series: “Your AI Thought Partner.” These posts show how I worked with Claude. Where I provided domain knowledge. Where Claude suggested better approaches. What prompting patterns worked. What the ROI looks like. The “how we worked together” content for anyone trying to figure out effective AI collaboration.

The Meta System

And then we built the system to manage this workflow:

Automatic conversation extraction from Claude Code logs
Dual-blog tracking system (which sources have which blog types)
Templates for both audiences
Social media generation for both streams
Idea management (40+ technical topics, 10+ collaboration topics already queued)

The system currently tracks 20 technical sessions and 30+ conversation transcripts. Each one can potentially become content for both blogs, different audiences, different value propositions.

Claude helped me build a publishing system for documenting how I work with Claude. If that’s not meta enough, I don’t know what is.

Why This Matters Beyond My Setup

This isn’t just about DGX infrastructure or my specific use case.

If you’re technical: This shows what’s possible when you partner with AI on infrastructure you don’t implement daily. I know systems architecture. Claude knows the implementation details cold. Together, we built in three days what would have taken me weeks (maybe months) alone. The technical blog shows you the “what.” The collaboration blog shows you the “how.”

If you run teams or make technology decisions: This is what AI collaboration actually looks like in practice. Not replacing expertise. Extending capability. The collaboration blog is specifically designed to show you the process, the decision-making, and the ROI without requiring you to understand CUDA drivers or Docker networking.

If you’re AI-curious: You’re watching this happen in real-time. I’m documenting both what I’m building (technical blog) and how I’m building it with AI (collaboration blog). You can follow either stream, or both, and see this approach evolve.

The key insight: one petaflop of compute isn’t valuable because of the hardware. It’s valuable because I can experiment freely, iterate quickly, and document everything with a partner who’s expert at the implementation details I don’t do every day.

What’s Coming

I’m publishing 3-5 posts per week across both blogs. Some weeks heavier on technical content. Some weeks more focused on the collaboration process.

Technical blog (AIXplore) upcoming topics:

Quality evaluation frameworks for LLMs
Fine-tuning workflows on PubMedQA (in progress)
Multi-model routing strategies
GPU optimization for inference vs. training workloads

Collaboration blog (launching this week here) upcoming topics:

How I wrangled one petaflop with a Linux expert partner
Prompting patterns that consistently work vs. ones that fail
When to guide Claude vs. when to let it lead
Actual ROI: my first 30 days with Claude Code (time saved, cost avoided, capabilities gained)

Both blogs document the same work from different angles. The technical blog assumes you want to implement similar systems. The collaboration blog assumes you want to understand (or improve) how you work with AI.

The Invitation

If you’re technical and want detailed implementations, code samples, and reproducible methods, follow the AIXplore technical blog. Everything is published there with full context.

If you’re interested in effective AI collaboration, prompting strategies, and ROI, the collaboration blog launches this week (also at AIXplore, different series). Real prompts. Actual decision points. Honest assessments.

You can follow one stream or both. Either way, you’re seeing the same research from different perspectives.

Where This Goes

Last week I told you Claude had become my thought partner. This week I’m showing you what we built in three days and how I’m documenting the entire process.

The DGX is running experiments right now. The documentation system is capturing everything automatically. And I’m publishing both the technical implementation and the collaboration process in parallel.

Let’s see what research comes out of this setup. I’ll be documenting the whole thing, in two ways, for two different audiences who both care about AI but need completely different information.

Three days to build the lab. Now we get to use it.

Links Referenced:

AIXplore Technical Blog: https://publish.obsidian.md/aixplore
“When Simple Heuristics Beat ML by 95,000x”: https://publish.obsidian.md/aixplore/Practical+Applications/dgx-lab-intelligent-gateway-heuristics-vs-ml-day-1
“Supercharge Your Shell with 50+ ML Productivity Aliases”: https://publish.obsidian.md/aixplore/Practical+Applications/dgx-lab-supercharged-bashrc-ml-workflows-day-2
Previous Substack post: https://rundatarun.io/p/the-quiet-week-claude-became-your

Run Data Run

Discussion about this post