The Math Everyone Gets Wrong

You've done the calculations. Token costs fell 280x in two years. Your inference model should cost $0.02 per request. Your unit economics look solid.

Then your bill arrived: up 320%.

This isn't a paradox. It's a subsidy cliff you weren't told existed.

OpenAI spends $1.35 for every dollar it earns — and those losses are driven not by R&D or headcount, but by the cost of serving billions of inference requests per day. The current race-to-zero on inference pricing is funded by venture capital and hyperscaler cross-subsidies. As capital discipline tightens and the $110B OpenAI round gets deployed into compute, unit economics will need to improve. Expect 12–24 months before meaningful price normalization, but plan for it now.

The gap between what frontier labs charge and what it costs them to serve a token is not ambiguous. A developer running a moderately deep coding session — 50 messages, average context accumulating to about 100,000 tokens — costs approximately $15.30 in input tokens at Claude Sonnet 4.6 rates. Add output tokens and the total easily reaches $25–$40 for a single session. A developer running five such sessions per day generates $125–$200 in daily API value. They pay $20 per month.

The subsidy is deliberate and visible. When you're private and burning venture capital, you can subsidize inference. You can run models at a loss. You can offer $20/month unlimited plans that cost you +$100/month to serve.

When the Subsidy Ends

This isn't stable. The average enterprise AI budget has grown from $1.2 million per year in 2024 to $7 million in 2026. Some Fortune 500 companies are reporting monthly AI inference bills in the tens of millions of dollars. At scale, the subsidy becomes unsustainable. OpenAI's inference costs reached $8.4B in 2025 and are projected to rise to $14.1B in 2026.

The historical pattern is clear. Cloud services, streaming, ride-sharing, food delivery — every category that launched with subsidized pricing eventually rationalized toward sustainable economics. The shift is already visible: On February 9, 2026, OpenAI began running ads in ChatGPT for users on its Free and Go subscription tiers. The company that defined the consumer AI experience decided that subscriptions alone could not cover costs and introduced advertising within four years of launch.

Industry analysts estimate that current API pricing may need to increase 3-10x to reach sustainable economics. Some models might need to go even higher. A comprehensive inference request that costs you $0.01 today might cost $0.05 or $0.10 tomorrow.

What This Means for Your Business

Your AI ROI calculations assume today's token pricing. They shouldn't.

Plan for API price normalisation in your 2027 budget. Current API pricing is subsidised. Budget conservatively assumes 30-50% API price increases over the next 18 months as AI vendors move toward sustainable unit economics. Enterprises that have not stress-tested their AI business cases against higher inference costs face material budget surprises.

For every agentic workflow locked into frontier model APIs at current pricing, businesses are building on a subsidized foundation. Price normalization — upward — is a when, not an if. Designing for model-agnosticism today is the most important architectural decision you can make.

The Three Moves

First: implement model routing. Routing 80% of routine inference traffic to cost-optimised models while reserving frontier models for complex tasks reduces inference spend by 60-80% with minimal quality impact. This is the single highest-ROI AI cost optimisation available in 2026.

Second: architect for optionality. Don't optimize for one vendor's pricing. Build workflows that can swap providers—or move to self-hosted open-source—when the economics shift. As open-source models from Meta, Mistral, and DeepSeek have reached quality levels competitive with proprietary alternatives, enterprises no longer need to pay API premiums for frontier-model access.

Third: measure outcomes, not tokens. Shift your AI metrics from technical to financial. Board-level AI reporting should track Cost per Resolved Ticket, Revenue per AI Workflow, and Human-Equivalent Hourly Rate — not token counts and latency percentiles.

The cost of intelligence is falling. The cost of deploying it is about to spike. The gap between these two curves is where your 2027 budget breaks.