The Specification Problem | Context Systems

Every organization I work with is simultaneously overestimating and underestimating what large language models can do. These aren’t different organizations; they’re the same ones, often in the same conversation. They overestimate the model’s ability to read their minds and underestimate what it can produce when properly directed. The result is a strange limbo: ambitious rollouts that generate aggressively mediocre output, followed by a quiet retreat to the way things were done before.

I see this constantly. A prompt that reads “write a whitepaper on Topic X using information from Confluence,” with nothing about audience, nothing about technical accuracy, no market context, no success criteria. The person who wrote that prompt genuinely believes they’ve delegated the work. They haven’t. They’ve delegated the production while skipping the part that actually matters: the specification. And the gap between those two things is where most AI initiatives go to quietly die.

The delegation gap

01A polished whitepaper reflecting institutional expertise

02Strategic analysis with tailored recommendations

03Content that sounds like us, not like everyone else

04Work that would have taken a week, done in minutes

For decades, the value of knowledge work lived in the production. The analyst who could build the model. The writer who could draft the brief. The engineer who could ship the feature. The doing was the hard part, and the thinking-about-what-to-do was treated as a preamble, something you got through quickly on the way to the real work. AI has inverted this entirely. When the marginal cost of production approaches zero, the only remaining source of value is knowing exactly what to produce, for whom, to what standard, and how you’ll know it’s right. The specification is the work now.

This is not an abstraction. It’s a concrete shift in where organizational competence needs to live, and most leadership teams haven’t caught up to it yet.

The platform purchase reflex

For twenty-plus years, the enterprise instinct has been “just buy a tool.” A platform will solve the productivity gap. A vendor will close the efficiency problem. This reflex isn’t irrational; it worked reasonably well when the tools were deterministic. You bought a CRM, you configured it, it did what it said on the box. Buuuut language models aren’t deterministic in the same way, and configuring them isn’t a one-time setup task. The specification has to be continuous, contextual, and specific to each workflow it touches.

The enterprise reflex

Identify problem

Evaluate vendors

Buy platform

Configure once

Done

What makes the platform purchase especially seductive right now is that the tool is extraordinarily capable, and that capability masks how much work you’ve left on the table. When your AI assistant produces something that looks polished in thirty seconds, it’s easy to mistake speed for quality. The surface is convincing. The structure is there. It reads like a professional document! But it reads like everyone’s professional document, because nothing in the process told the model what makes yours different.

You can see the frontier labs trying to address this. Anthropic is shipping enterprise features. OpenAI’s building out organizational memory. They understand the gap, but none of them can truly do the work of breaking down your specific workflows, identifying the context each one requires, and building the validation gates that produce consistent results. They can give you infrastructure; they cannot give you specification. That part is irreducibly yours.

The regression to the mean

Without precise specification (without context about who this is for, what institutional knowledge applies, what good looks like in this particular case) everything you produce regresses to the model’s mean. And the model’s mean is, by definition, generic. It’s slop. It’s LLM-isms and filler. It reads like everyone else’s regression, because it is.

This is the part I find genuinely interesting: the same leaders who would never accept a new hire producing undifferentiated work will accept it from an AI system, because the output arrived fast and looked clean. The surface quality creates a permission structure for mediocrity. And boy does it compound. Once teams learn that “good enough” output comes free, the muscle for specifying what “actually good” looks like starts to atrophy and you end up with an organization that is producing more and saying less.

Without specification

With specification

Polished but generic output

Output shaped by institutional voice

Reads like everyone else's work

Reads like only you could have written it

Speed mistaken for quality

Speed applied to validated standards

Undifferentiated at scale

Differentiated and compounding

What makes this harder is that the specification challenge looks different for every team. What “good output” means for a software engineering team is not what it means for marketing, for research, for operations. The prompts are different, the context requirements are different, the validation criteria are different. A software team can write tests; a marketing team needs a different kind of gate entirely. You can’t solve this with a single platform purchase, no matter how sophisticated. The specification work is granular, team-level, sometimes role-level. It requires understanding and breaking down the actual workflows.

Own the layer that matters

So where does that leave the investment question? Like automating anything, you have to put in significant upfront work to understand your final product and reverse engineer from there. Define the output. Define what makes it good. Define the context required to get there, and then build backward.

But here’s the tension (and I’ll be honest, I don’t think anyone has fully resolved it): you also can’t over-invest in whatever infrastructure is popular this quarter, because the next round of models is likely to make much of it obsolete.

The agent framework you’re building today may be irrelevant in eighteen months. The retrieval pipeline you’re optimizing may be unnecessary when the next context window doubles again. Infrastructure built around the current generation’s limitations has a short shelf life.

So what do you do? What can you do? You invest in the things that are very unlikely to change, regardless of which model or platform you’re running. Your organization’s context layer: the proprietary knowledge, processes, standards, and institutional memory that make your work yours and not someone else’s. And then the governance infrastructure around it: the permissions, access rules, and validation gates that govern how AI systems interact with that context.

I tell clients to think about it this way: treat your agent infrastructure like employees. You wouldn’t give a new hire unrestricted access to every system and document in the organization and say “figure it out” (I hope!). You’d scope their access, define what they can read versus what they can modify, set expectations for what good output looks like, and review their work before it ships. RBAC, read/write permissions, clear boundaries, human checkpoints. The same discipline applies, and it applies for the same reasons. The risk of unsupervised access isn’t hypothetical or even new; it’s the same risk you’ve been managing with people for decades, just operating on much faster loops.

This matters right now because the frontier labs want to own this layer for you. They want your context, your workflows, your organizational knowledge flowing through their systems, mediated by their infrastructure, locked into their ecosystem. And the instinct to let them (that twenty-year muscle memory of “just buy a tool and let the vendor handle it”) grows more costly with every new model release. Each generation is more capable, which means each generation makes the specification gap more expensive to ignore and the context layer more valuable to control.

Own the layer that matters

Context layer

The proprietary knowledge, processes, and institutional memory that make your work yours.

Governance infrastructure

Permissions, access rules, and RBAC that govern how AI interacts with your context.

Specification practice

Team-by-team definition of what good looks like, built into every workflow.

Validation gates

Human checkpoints and quality criteria that catch regression before it ships.

The organizations that will differentiate are not the ones waiting for an off-the-shelf solution to arrive fully formed. They’re the ones doing the work that no vendor can do for them: defining what good looks like, team by team and workflow by workflow; building a context layer they own and control; and recognizing that the ability to specify, precisely and testably, what you want produced is no longer a preamble to the real work, it is the real work.