← Journal0322026-06-234 min read

The token bill

Token prices fell 67 percent in a year. Enterprise AI spend went up anyway. Uber's story is the clearest account yet of what happens when you let agents run loose on the budget.

Uber's chief technology officer had a number in April that he did not have a story for. The ride-hailing company had exhausted its entire 2026 AI budget in the first four months of the year. The spend came almost entirely from token consumption on agentic coding tools — primarily Claude Code, the tool Anthropic released last year that writes and modifies code autonomously with minimal human intervention. Individual engineers were billing $500 to $2,000 a month in token costs. The company's response was to cap all employees at $1,500 per month per tool.

Uber president and COO Andrew Macdonald told Fortune in late May that despite the aggressive adoption — and the real productivity signal that adoption represents — he could not draw a clear line between rising AI spend and innovations that serve consumers. "That link is not there yet," he said. Sam Altman, appearing on CNBC the same week, acknowledged that the question of whether AI spending will ever produce returns is "the most fair criticism right now of AI."

Both of these statements were made by people at the center of the AI industry. Neither is a dismissal of AI; both are acknowledgments that something in the current deployment model is not working the way the spreadsheets assumed.

The underlying dynamic is mechanical. Token prices have fallen dramatically — 67 percent year-over-year, from $18.40 to $6.07 per million tokens between Q1 2025 and Q1 2026, based on blended market rates tracked by industry analysts. This looks like deflation. In unit terms, it is. But total enterprise AI spend equals price per unit multiplied by volume consumed, and the second variable is growing faster than the first is falling.

The volume problem is specific to agents. A standard chatbot that answers a query in a single turn burns a modest number of tokens per interaction. An agentic system that breaks a task into subtasks, calls tools, checks its own output, iterates, and uses memory across sessions burns five to thirty times more tokens for the same notional task. EY's analysis of agentic AI token costs puts the per-interaction comparison this way: a simple linear AI workflow in 2023 cost approximately $0.04. A complex orchestrated agentic system in 2026, with tools, reasoning loops, and iterative refinement, runs to around $1.20 per interaction. Lower price per token, thirty times higher consumption per task, higher total spend.

The teams most affected are not the ones that deployed agents casually. They are the ones that deployed agents seriously — that embedded agentic coding tools into their engineering workflow, got the productivity signal, encouraged broader adoption, and then saw the billing statement. Adoption velocity and cost control are not naturally aligned when the unit of adoption is a tool that generates variable token volumes per session.

The FinOps community has noticed. In 2025, 31 percent of FinOps practitioners surveyed said they were responsible for managing AI spend. By 2026, that figure is 98 percent. The Linux Foundation announced the Tokenomics Foundation, positioning it as a standards body for AI token cost discipline modeled on what FinOps did for cloud infrastructure spending. Whether a standards body is the right instrument for this problem is debatable; that the problem is real enough to attract institutional attention is not.

The operational failure mode that is creating the worst surprises is what some teams are calling tokenmaxxing: background agentic processes that run continuously, generate output that nobody reviews, and accumulate token spend as a side effect of being left on. CockroachLabs' engineering blog published a useful taxonomy of the common patterns — ambient agents polling state at fixed intervals regardless of whether anything changed, summarization agents generating summaries of summaries, and evaluation loops running more iterations than the task required because there was no exit condition for "good enough." None of these are bugs in the agent; they are design choices that were made without thinking about token cost as a design constraint.

The corrective is not to stop using agents. The corrective is to treat token consumption as a first-class output metric the same way you treat latency or error rate. If you cannot answer "how many tokens does this workflow burn per run" for any agentic workflow in production, you are operating blind in a way that will surface eventually in a billing statement.

There is also a harder question embedded in Macdonald's comment, and Altman's, that deserves to be stated plainly: the value of AI spend at enterprise scale is currently difficult to measure because the productivity gains are diffuse and the costs are concentrated. Engineers ship faster. Whether what they ship is better, or more valuable, or serves the users the company serves, is not a number you can read directly from token consumption. The gap between the two is where most enterprises are sitting right now.

The bill is here. The ROI accounting is still being worked out.

The short of it.

Uber burned through its entire 2026 AI budget in four months, capped employees at $1,500 per month per tool, and its COO admitted he cannot yet draw a line between AI spend and consumer value. Token prices fell 67 percent in a year, but agentic systems burn 5-30x more tokens per task than chatbots, pushing total enterprise AI spend up even as unit costs fall. The FinOps community is scrambling — 98 percent of practitioners are now managing AI spend, up from 31 percent last year. The fix is treating token consumption as a design constraint, not an afterthought; any workflow you cannot cost per-run before deploying will cost you more than you planned.