Context Engineering

The Art of Teaching LLMs What They Need to Know

Jun 27, 2025

The Discovery of the term "context engineering" reframed the entire problem of building reliable AI systems for me. What started as a blog post from Cognition's Walden Yan has rapidly evolved into a fundamental shift in how we think about LLM applications. The term gained momentum when Shopify CEO Tobi Lütke declared it his "context engineering tool of choice," sparking discussions across the AI community.

This emerging pattern is reshaping AI engineering by moving us beyond simple prompt optimization toward building dynamic systems that manage the entire information ecosystem around an LLM. It's the difference between giving a brilliant assistant a single instruction versus providing them with the right office, tools, reference materials, and ongoing guidance to excel at their job.

The Origin Story: From Prompt Engineering to Something More

When I first wrote about advanced prompt engineering for oncology data science, I thought I had cracked the code. Carefully crafted prompts, structured thinking patterns, chain-of-thought reasoning, these techniques dramatically improved my results. But something was still missing. The prompts worked beautifully in isolation but struggled when integrated into larger, more complex workflows.

Walden Yan's groundbreaking post cut through the noise with a simple but powerful distinction:

"Prompt engineering was coined as a term for the effort needing to write your task in the ideal format for a LLM chatbot. Context engineering is the next level of this. It is about doing this automatically in a dynamic system."

The post detailed hard-won lessons from building Devin, Cognition's AI software engineer. Yan revealed that the most tempting architectural patterns — particularly multi-agent systems — often led to fragile, unreliable applications. Instead, he advocated for principles that prioritize context continuity and decision coherence.

The response was immediate. Harrison Chase from LangChain noted: "We think LangGraph is really great for enabling completely custom context engineering - but we want to make it even better." The framework creators were acknowledging what practitioners had been discovering: the real challenge wasn't writing better prompts, but orchestrating entire contexts.

What Context Engineering Really Means

The RAM Analogy That Clicked With Me

Lance Martin's explanation provides perhaps the clearest mental model for understanding context engineering:

"Context enters an LLM in several ways, including prompts (e.g., user instructions), retrieval (e.g., documents), and tool calls (e.g., APIs). Just like RAM, the LLM context window has limited 'communication bandwidth' to handle these various sources of context. And just as an operating system curates what fits into a CPU's RAM, we can think about 'context engineering' as packaging and managing the context needed for an LLM to perform a task."

This isn't just a clever analogy; it fundamentally changes how we approach building LLM applications. We're not prompt writers; we're operating system designers for AI.

The Core Components

1. Dynamic Context Management

Context engineering moves beyond static prompts to adaptive systems that respond to the evolving needs of a conversation or task. This is where my exploration of DSPy for programming language models at scale became relevant. DSPy treats prompts as programs that can be optimized, but context engineering takes this further, it's about managing the entire environment in which those programs run.

As Dex Horthy emphasizes in his 12-factor agents framework, owning your context window is non-negotiable. You need to control what information flows in, when it arrives, and how it's structured.

2. Information Architecture

The structure of information matters as much as its content. Yan's principles highlight how actions carry implicit decisions, and conflicting decisions lead to bad results. Think of it like organizing a executive briefing, the order of information, what's highlighted, what's in the appendix, all shape the decisions that follow.

3. Tool and State Coordination

Perhaps the most challenging aspect is ensuring consistency across multiple steps. When an LLM makes decisions based on partial context, those decisions can conflict with later steps that have different context. It's like having different departments in a company make decisions without talking to each other; chaos ensues.

Lessons from the Trenches: What Actually Works

The Multi-Agent Trap

Walden Yan's post crystallized what many had discovered through painful experience. His two key principles deserve repeating:

Principle 1: Share context, and share full agent traces, not just individual messages
Principle 2: Actions carry implicit decisions, and conflicting decisions carry bad results

The temptation to split complex tasks among multiple specialized agents is strong. It feels clean, modular, and aligned with good software engineering practices. But as Yan demonstrates with his Flappy Bird example, when subagents lack the full context, they make assumptions that conflict with each other.

Imagine asking one team to design a car's exterior and another to design the engine, without them ever talking. You might end up with a beautiful sports car body housing an economy engine , it’s technically functional, but missing the point entirely.

The Goldilocks Problem

Andrej Karpathy's observation captures the delicate balance required:

"Too little or of the wrong form and the LLM doesn't have the right context for optimal performance. Too much or too irrelevant and the LLM costs might go up and performance might come down. Doing this well is highly non-trivial."

This isn't just about token limits, it's about cognitive load. Even within a model's context window, information competes for attention. The art lies in providing exactly what's needed, when it's needed, in the clearest possible form.

I learned this firsthand when building oncology workflows. Initially, I tried to include every possible piece of patient history, every lab result, every research paper citation. The model's performance actually degraded. It was only when I started curating the context; recent labs, relevant history, specific guidelines that the system became truly useful.

Practical Patterns That Emerged

1. The Context Window as Sacred Space

Dex Horthy's "12-factor agents" framework puts it bluntly: own your context window. Frameworks that abstract this away often fail in production because they make assumptions about what information matters.

Think of your context window like prime real estate in a city center. Every piece of information needs to earn its place. Is this historical data still relevant? Does this tool output add clarity or confusion? Would a summary serve better than the full document?

The most successful implementations treat context curation as a first-class concern, not an afterthought. They build explicit systems for:

Filtering out redundant information
Summarizing verbose outputs
Prioritizing recent and relevant data
Maintaining coherent narrative flow

2. The Compression Challenge

For long-running agents, context management becomes even more critical. Yan describes their approach at Cognition:

"We introduce a new LLM model whose key purpose is to compress a history of actions & conversation into key details, events, and decisions."

This isn't simple summarization, it's intelligent curation. The compression model must understand what information will be crucial for future decisions:

Key decisions made (and their rationale)
Constraints discovered during execution
Failed approaches (to avoid repetition)
Current state and pending goals

It's like having an executive assistant who knows exactly what to include in a briefing based on upcoming meetings. They don't just summarize; they anticipate what information will be needed.

3. The Debugging Revolution

Harrison Chase emphasizes how LangGraph was built specifically to enable context engineering:

"One of the downsides of agent abstractions (which most other agent frameworks emphasize) is that they restrict context engineering. There may be places where you cannot change exactly what goes into the LLM, or exactly what steps are run beforehand."

When you control the context, debugging transforms from guesswork to engineering. You can see exactly what information the model had when it made a decision. You can replay scenarios with modified context. You can identify where irrelevant information crept in or crucial details got lost.

This visibility is transformative. In my oncology work, being able to trace exactly what clinical guidelines were in context when a recommendation was made turned the system from a black box into a transparent assistant.

The Future is Already Here

What's Working Now

The shift from "prompt engineering" to "context engineering" reflects a maturing field. We're seeing success in production systems that:

Treat context windows as managed resources, not unlimited spaces
Build explicit context architectures rather than relying on emergent behavior
Use specialized models for context compression and management
Implement context versioning for debugging and optimization

Organizations successfully implementing these patterns report dramatic improvements in reliability, consistency, and user trust. The systems feel less like unpredictable AI and more like well-designed software.

Tools that embrace these principles include:

LangGraph for controllable agent architectures
HumanLayer for human-in-the-loop context management
Custom context compression pipelines (as described by Cognition)

What's Coming Next

The evolution toward autonomous context management is accelerating. We're seeing early experiments with:

Context-aware routing that dynamically adjusts what information is included based on the task
Learned compression models that understand domain-specific importance
Hierarchical context systems that manage different time scales of memory

Yet as Giovanni Foglietta observes, even as models improve, the principles remain:

"What consistently improved GenAI output wasn't a better prompt, it was a better setup. The clearer the model's environment, the better the result. What matters is context, not cleverness."

Your Context Engineering Journey Starts Here

Key Takeaways for Practitioners

Start with visibility — You can't engineer what you can't see. Understand exactly what context your LLM receives at each step.
Embrace the complexity — Context engineering is hard because it's solving a hard problem. Simple solutions often fail at scale.
Own your context — Don't let abstractions hide the most important part of your system. Every piece of information matters.
Iterate relentlessly — What works for one use case might fail for another. Build systems that let you experiment quickly.

The transition from prompt engineering to context engineering represents a fundamental shift in how we build AI systems. We're no longer just crafting clever instructions but we're building dynamic information architectures that help LLMs understand and navigate complex tasks.

My journey from carefully crafted oncology prompts to programmatic approaches with DSPy to embracing context engineering has taught me that the most elegant prompt is worthless without the right context to support it.

As you apply these principles, remember that context engineering is still evolving. The patterns we've discussed are starting points, not final answers. The field needs practitioners who will push boundaries, share failures, and collectively advance our understanding.

Resources and Further Reading

Foundational Posts:

Community Discussions:

Related Explorations:

Tools and Frameworks:

LangGraph Documentation
HumanLayer SDK
Voice AI & Voice Agents Primer - For context engineering in voice applications

Academic and Technical Deep Dives:

Run Data Run

Discussion about this post