The Cost Subsidization of LLM Use
When you call an LLM API, you are almost certainly not paying what it costs to serve the request. The gap between inference price and inference cost is one of the more consequential facts in the current AI economy, and it is deliberately maintained.
The Cost Gap
In 2025, OpenAI generated approximately $3.7 billion in revenue and lost an estimated $5 billion, spending $1.35 for every dollar earned. One bottom-up estimate put the true inference cost for a frontier model at roughly $6.37 per million tokens at a time when GPT-4o-mini listed at $0.60. That implies a subsidy rate above 90% for some models.
The gap is not uniform. Budget-tier models can approach marginal cost. Frontier models carry the heaviest implicit subsidy. DeepSeek forced a reset: its V4-Pro model, after a permanent 75% price cut in May 2026, listed at $0.0035 per million tokens using a novel sparse attention architecture. Google’s Gemini Flash-Lite set the new industry floor at $0.25/M input tokens.
A 2025 paper mapping price dynamics identified a structural break in May 2024: the shift from technology-driven price decline (better hardware, more efficient models) to competition-driven price decline, where providers began undercutting each other below cost.
Who Pays
The subsidy has multiple layers.
Microsoft provides OpenAI with compute at below-market rates as part of their partnership. Even with that discount, OpenAI still operates at a loss. SoftBank committed $40 billion to the Stargate Project, building dedicated AI infrastructure outside Microsoft’s ecosystem. These are capital subsidies, not efficiency gains.
Hyperscalers (Google, Microsoft, Amazon) can sustain losses longer than pure-play labs. Google can price Gemini models below cost and still profit from AI search integration. A standalone lab cannot cross-subsidize this way.
The logic resembles early Uber or AWS: price below cost to establish developer lock-in, build the ecosystem, then rationalize margins once switching costs are high enough.
The Developer Trap
For builders, subsidized pricing creates a structural risk that is not always visible at the time of building.
At scaling-stage AI companies, inference costs average 23% of total revenue. That token tax locks gross margins 30 points below mature SaaS norms: AI gross margins run 50-60% vs. 70-90% for established software businesses. This is the current state with subsidized prices.
If prices normalize toward cost, the hit compounds. A product built at $0.60/M tokens that reprices to $3.00/M tokens does not need to lose many margin points to become unviable. The danger is that cheap inference gets baked into product assumptions that are hard to undo.
ChatGPT’s own revenue-per-user dynamics illustrate the trajectory. Plus subscribers are projected to fall from 44 million in 2025 to 9 million by 2026 as users shift to cheaper tiers, compressing average revenue per user from ~$23/month to under $12/month. OpenAI is trading unit economics for volume, betting that ecosystem lock-in survives the race to the bottom.
Price Trajectories
The approximately 300x price decline from 2023 to 2026 split into two phases.
Phase 1 (2023-2025): Supply-side efficiency drove prices down. Mixture of Experts architectures reduced active compute per token without proportional reductions in model capability. Hardware got cheaper. Algorithmic improvements like prompt caching and speculative decoding cut actual cost per query. Providers passed most of this through.
Phase 2 (2025-2027): Competition-driven cuts pushed prices below cost. DeepSeek’s open-weight releases created pressure on every closed provider. The commoditization arc (the same one that flattened cloud storage, bandwidth, and CPU cycles) is accelerating.
Gartner projects 90%+ cost reduction for inference on a 1-trillion-parameter model by 2030. Whether that reduction reaches API prices depends on whether the competitive dynamic holds or the market consolidates to a few players who stop undercutting.
What Holds Prices Low
Three forces work against price normalization.
Open-weight models keep a cost ceiling on closed providers. Any provider that prices significantly above the self-hosting cost of an equivalent open model loses enterprise customers with enough token volume to care. The self-hosting breakeven is roughly 50-100M tokens/month for a 70B-class model, below which managed APIs win on total cost including operations overhead.
Capital availability allows frontier labs to continue burning money. As long as SoftBank, Microsoft, and others have strategic reasons to subsidize usage, prices stay low.
Competitive signaling discourages unilateral price increases. The first mover that raises prices absorbs the customer migration. No one moves first.
What could break the equilibrium: market consolidation, investor discipline, energy and GPU supply constraints, or a model capability plateau that reduces the competitive pressure to buy market share.
Try it
Estimate your subsidy exposure (1 hour, any API). Calculate your current monthly token cost. Then reprice the same usage at 5x. How does that change your product’s gross margin? If the answer is “it breaks the business,” you are currently dependent on the subsidy holding, and that is a product risk worth naming explicitly.
Run the self-hosting breakeven (an afternoon, Replicate or Modal). Spin up a 70B model on a rented GPU. Measure cost per million tokens at your actual traffic volume. If it’s cheaper than your API provider, you have crossed the breakeven threshold and the subsidy is no longer relevant to your pricing exposure.
See also
- Mixture of Experts — the primary architectural technique that lets providers reduce active compute per token; a genuine efficiency gain as opposed to a financial subsidy
- Anthropic Prompt Caching — one practical cost-reduction mechanism available to API users today, reducing repeated-prefix costs by ~90% on cache reads
- Writing Code vs Shipping Code - AI Productivity Across Tool Generations — the productivity gains described there assume cheap, abundant inference; if that assumption changes, the economics of AI coding tools change with it
- [private link] — the actual cost structure: GPU amortization, utilization rates, throughput
- [private link] — the competitive dynamics and strategic game between labs and hyperscalers
Sources
- AI Inference Cost Crisis 2026: Why OpenAI Loses $1.35 Per Dollar Earned
- The Price of Progress: Algorithmic Efficiency and the Falling Cost of AI Inference, arXiv:2511.23455
- AI Token Futures Market: Commoditization of Compute, arXiv:2603.21690
- DeepSeek’s steep V4-Pro price cut escalates AI pricing war, InfoWorld
- AI Agent Economics: Token Tax Locks Gross Margins 30 Points Below SaaS Baseline, TechTimes
- Gartner: By 2030, Inference on 1T-Parameter LLM Will Cost 90%+ Less Than in 2025
- SoftBank has fully funded $40 billion investment in OpenAI, CNBC
- AI API Pricing War 2026: Costs Dropped 60-80%, TokenMix
- Inference Unit Economics: The True Cost Per Million Tokens, Introl