What is the difference between an AI wrapper and a real AI product?

Andrew Stroup argues the useful distinction is not whether a company calls a foundation model API, because nearly every AI company does. It is whether the company's value lives in the prompt or in the system around the prompt. A wrapper's value lives in the system prompt and a few domain instructions, which the model underneath absorbs and commoditizes over time. A real product's value lives in everything between the model calls: the integrations, the data pipeline, the reliability layer, the institutional knowledge, and the workflow logic that the model alone cannot provide.

Why are AI wrapper startups failing?

Two structural forces. First, cost compression: Stanford HAI documented a 280x cost reduction for GPT-3.5-level performance over 24 months, so any business whose margin is the spread between what it charges and what the API costs sees that spread compress every quarter. Second, capability absorption: every foundation model release handles more domain knowledge natively, so prompt engineering tuned for one model is worth less the day the next model ships. An independent teardown of 200 venture-funded AI startups found that roughly three-quarters were essentially repackaged model APIs, and wrappers face 65% churn within 90 days, nearly double the SaaS average.

Are Cursor, Harvey, and Sierra AI wrappers?

Technically yes, all three are built on off-the-shelf foundation models, but their value does not live in the model call. Cursor crossed $2B in ARR because it indexes your entire codebase, understands file relationships, and orchestrates multi-file edits, not because of its prompt. Harvey ($195M in legal) and Sierra ($150M in enterprise CX) follow the same pattern. Stroup uses them to show that 'wrapper or not' is the wrong question. The right question is whether value lives in the prompt or in the system around it.

Can an AI wrapper ever be a good business?

Yes, as an exception. An exceptional operator who deeply understands a vertical, has the relationships to land early customers, and moves fast on go-to-market can use a wrapper as a wedge for hard-won domain expertise. But that motion typically caps at a narrower total addressable market, which makes it a strong niche business or modest exit rather than a venture-scale platform. Stroup's caution is that raising venture capital to build a thin prompt layer serving a narrow vertical creates a structural mismatch between the cap table and the ceiling.

If Your AI Startup Is a Prompt, There Is a Problem

Q: What is the test for whether your AI startup is a feature or a company?

Stroup's diagnostic: if you deleted your system prompt and gave the customer the same foundation model with a blank text box, what would they lose? If the honest answer is the formatting and some domain-specific instructions, you are a feature, not a company. If the answer involves the entire workflow, the integrations, the data pipeline, the reliability layer, and the institutional knowledge baked into the system, you have something worth defending.

What eight years of building taught me about the difference between a prompt and a product.

I spent eight years building Leverage AI. The defensible value was never in the software layer. It was in the customer relationships with procurement teams at industrial manufacturers, the regulated workflows around purchase orders that touch financial controls and audit trails, and the institutional trust that came from flying into facilities in second-tier cities to learn the specific shape of each customer’s supply chain before writing a line of code. The software layer on top changed regularly as models improved and capabilities expanded. The foundation did not. It was built on years of customer trust and institutional knowledge that no model update could replicate or replace.

The prompt inside the software was a component of a much larger system, and the system was the product. I watched that distinction play out over years before the current AI startup wave made it visible at enormous scale.

I wrote earlier this year about where defensible value moved when code got cheap, about who is positioned to capture it, and about where the underlying skill was actually built. This post is about the structural trap that a large share of the current wave is falling into instead.

The pattern

What most “AI for X” companies do is straightforward: take a foundation model API, prepend thousands of tokens of vertical-specific instructions and documents, and present the output as domain expertise. The pitch varies by industry, but the architecture is the same. It is a foundation model with a long system prompt.

An independent teardown of 200 venture-funded AI startups, inspecting their network traffic and JavaScript bundles, found that roughly three-quarters were essentially repackaged calls to OpenAI or Anthropic with a custom interface on top. SimpleClosure’s 2025 shutdown report confirmed the first major wave of closures, with wrappers facing the sharpest correction. The data underneath is stark: 65% churn within 90 days, nearly double the SaaS average, with only 3 to 5% of AI startups surpassing $10K in monthly revenue. The pattern is visible at scale, and two structural problems explain why.

Why it breaks

The first problem is cost compression. Stanford HAI documented a 280x cost reduction for GPT-3.5-level performance over 24 months. Epoch AI found median price declines of 200x per year after 2024, with some capability milestones dropping 900x annually. If your business model depends on the spread between what you charge customers and what the API costs you, that spread is compressing every quarter. You are arbitraging a price curve that moves against you, and the curve is accelerating. A wrapper business profitable at today’s API pricing may not be profitable at next quarter’s pricing, and the repricing happens whether you are ready for it or not.

The second problem compounds the first. Every foundation model release erodes the value of domain-specific prompt engineering because the new model handles more of that domain knowledge natively. The prompts you spent six months tuning for GPT-4 are worth less the day GPT-5 ships. Context windows expanded 250x in fifteen months, which means the expensive context management you built is becoming a standard feature of the model itself. Google VP Darren Mowry said in February 2026 that LLM wrappers and AI aggregators have their “check engine light on.” Wing VC’s Jake Flomenberg posed the question I think every founder in this space should answer honestly: “If OpenAI launches a model 10x better tomorrow, does this company still have a reason to exist?” For most companies in this wave, in my view, the honest answer is uncomfortable.

The honest distinction

Some wrappers won big, and pretending otherwise would weaken this argument. Cursor crossed $2B in ARR using off-the-shelf LLMs, Harvey hit $195M in legal, and Sierra reached $150M in enterprise customer experience, all of them technically wrappers.

What separates them from the majority is that their value does not live in the model call. It lives in everything between model calls. Cursor does not succeed because of its prompt. It succeeds because it indexes your entire codebase, understands file relationships and dependencies, orchestrates multi-file edits across isolated workspaces, and maintains enough context about your project’s architecture to make the model’s output useful in your specific situation. Harvey and Sierra follow the same structural pattern in their respective verticals.

“Wrapper” has become imprecise as a category. The useful distinction, in my view, is between companies whose value lives in the prompt and companies whose value lives in the system around the prompt. Karpathy calls the latter “context engineering.” The label matters less than the structural difference it describes, and that distinction is becoming more visible to customers who are increasingly asking what exactly they are paying for when the model underneath keeps getting cheaper and more capable on its own.

At Leverage, the prompt was maybe 5% of what made the product work for a given customer. The other 95% was the integration into their specific ERP, the mapping of their approval chain, the compliance rules that varied by facility, and the workflow logic that only became visible after sitting inside their procurement office for two weeks. That system was the product. The prompt was one component of it. The companies that are AI wrappers made the opposite bet, treating the prompt as the product and the system as an afterthought. They are discovering that the ratio matters.

The test

The diagnostic I keep coming back to is simple: if you deleted your system prompt and gave the customer the same foundation model with a blank text box, what would they lose?

If the honest answer is “the formatting and some domain-specific instructions,” you are a feature, not a company. If the answer involves the entire workflow, the integrations, the data pipeline, the reliability layer, and the institutional knowledge baked into the system, you have something worth defending. The test is crude on purpose. Complexity creates ambiguity, and ambiguity is where founders hide from the honest answer.

In my experience, the companies that pass this test tend to share a few traits. They accumulate data through product usage that a competitor with the same API key cannot replicate. Their integration depth creates switching costs measured in organizational change, not contract cancellation. They engineer task completion reliability above 95% at the system level because the model alone cannot guarantee it. And they route across models, which means they benefit from commoditization instead of being threatened by it.

Where I could be wrong

There is an exception worth stating honestly.

An exceptional operator who deeply understands a specific vertical, has the relationships to land early customers, and moves fast enough on go-to-market could make the wrapper approach work as a wedge, even if the long-term product is thin. The domain context they encode into the prompt is genuinely hard-won knowledge: industry-specific edge cases, regulatory nuance, workflow patterns a generalist would not know to include. If they ship fast enough, the wrapper becomes a distribution mechanism for their expertise instead of a standalone product. The business is real because the operator’s knowledge is real, and that is an important distinction from the majority of wrappers where neither the system nor the operator brings differentiated expertise.

The counterweight is that this motion likely caps at a narrower total addressable market precisely because of the niche focus. You built something that works for 200 companies in one vertical, and the prompt expertise does not transfer without rebuilding. The ceiling is a strong niche business or a modest exit, not a platform company.

The implication most founders in this position are not confronting is that a niche-operator wrapper with a capped TAM is probably not a venture-backable business. That is not a failure. It is a legitimate outcome and a real business, but it requires a different capitalization strategy, different growth expectations, and an honest conversation about what you are actually building. The founder who raises $10M in venture capital to build a prompt layer serving 200 companies in one vertical has created a structural mismatch between their cap table and their ceiling. The business might work. The return math for their investors probably does not.

That is a choice every founder in this wave will have to make, and I think making it honestly is better than discovering the mismatch three years into a venture-scale burn rate.

The inversion

The build cost on the wrapper layer collapsed. That was the thesis from the first post in this series. The people positioned to capture value are operators who already understand the moat layers underneath, which was the argument of the second. The skill that matters is built by embedding in real operating environments across multiple industries, which was the third.

This post is the structural inverse: the companies that mistook the prompt for the product are discovering that the cheapest layer of the stack is also the least defensible. The irony is that the layer most of these startups invested in most heavily is the one that depreciates fastest, and the window to convert a wrapper into a system closes a little more with every model release.

The question is not whether you are using someone else’s model. Almost everyone is. The question is what your system does that the model alone cannot. If the honest answer is “format the output and prepend some instructions,” you are a feature. And features get absorbed.

The pattern

Why it breaks

The honest distinction

The test

Where I could be wrong

The inversion

Get the next essay.

Keep reading

Everyone's Inventing AI Ketchup

The Agent Stack Is the New OS