Phase 3, Act II: The Meter Is Running

Part of the ongoing Big Tech's War on Users series.

Last month I wrote about how GitHub spent eight years burning through the trust it borrowed when Microsoft bought it. The data training opt-out, the quiet absorption into CoreAI, the class system between enterprise customers and the individual developers who built the place. Phase 3: Profit. Documented.

That was March. It's not done.

On June 1st, GitHub Copilot's billing model changes. What was a flat-rate subscription — pay your monthly fee, get your requests — becomes usage-based. Per token. AI Credits drawn down like a prepaid card you didn't know you had, billing at rates that vary by model, with overages charged at standard prices until you either upgrade, hit a cap you thought to set, or discover the line item on your credit card statement sometime in early July.

The data story was about what they do with your code. This one is about what they charge while you write it.

How It Worked Until Now

Copilot pricing has always been deceptively simple. You pick a tier — Free, Pro, Pro+, Business, Enterprise — and you get a monthly allotment of "premium requests." One interaction with a premium model equals roughly one request, regardless of whether that interaction was a single autocomplete or a multi-file refactor with forty back-and-forth exchanges. The complexity of the task didn't matter. The model's computational cost didn't appear on your bill.

That was by design. The flat-rate model trained an entire generation of developers to treat AI assistance as effectively free at the margin. You paid your $10 or $19 and stopped thinking about the per-call cost. That's the exact same pattern that made cloud storage sticky, that made streaming subscriptions feel like value, that made "unlimited" anything appealing — the anxiety of metered usage disappears, and you use the tool differently.

GitHub knew this. They built adoption on it.

What Changes on June 1st

All model usage gets converted to AI Credits, metered per token. Your plan still comes with a monthly included allotment — the equivalent of today's request quota, expressed differently. Once you exceed it, every additional interaction draws down credits at rates that vary by model. Lightweight models are cheap. Frontier models are not.

The specific rates are published. The math isn't complicated. What is complicated is that you've been trained for years not to do it.

Worth noting: this model isn't new. Anyone who's used the ChatGPT API, Claude's API, or most other AI tooling at any depth has been billed per token for years. Copilot was the outlier — deliberately so. The flat-rate model wasn't an oversight or a technical limitation; it was the hook. Make it feel unlimited, build the muscle memory, get it load-bearing in enough workflows that switching hurts. Then adjust the terms. Classic enshittification: good for users first, good for the business second, and the order is the strategy.

That said: the agentic push probably accelerated the timeline more than anyone planned for. A developer running chat and completions is a predictable cost. A developer running agent mode on a complex feature — multi-file context, dozens of model calls, an hour of autonomous work — is something else entirely, and the flat-rate model has no answer for it. Without that forcing function they might have kept coasting on the goodwill gap for another year or two, quietly benefiting while OpenAI and Anthropic kept raising the cost of living for power users on their platforms. The math stopped working faster than expected. That's a real problem — even if it's partly self-inflicted, given that GitHub spent the last year aggressively marketing and expanding the exact agentic features that broke the economics. They built the thing that broke the thing. It just doesn't change where the destination was always going to be.

And they're not alone in the math not working. In April, Anthropic quietly ended Claude Pro and Max subscription access for third-party agent frameworks after discovering a $200/month subscription was being used to run what amounted to thousands of dollars of compute. OpenAI launched a new $100 Pro tier the same month specifically to move heavy Codex and agent users out of Plus and into a higher bracket. As of right now, OpenAI and Anthropic share an identical premium tier structure — $100 for 5x usage, $200 for 20x — having converged on the same answer from different directions. The industry collectively discovered that agentic workloads and flat-rate subscriptions don't coexist, and everyone is adjusting on roughly the same timeline. GitHub is the last major player to make the move, not the first.

The billing change isn't happening in isolation either. Two other things land on the same timeline:

New Pro, Pro+, and Student signups are currently paused — GitHub stopped accepting them ahead of the transition. That's not a technical hiccup. That's a company bracing for volume and wanting the existing user base migrated cleanly before the wave.

And as of April 24th — same week, same company, different policy announcement — interaction data from individual Copilot users is being used to train Microsoft's AI models by default. I covered that in the previous post. The billing change and the data policy landed within the same news cycle. That's not coincidence; it's the same strategic direction expressed through two different levers.

The Sticker Shock Is the Point

Here's the failure mode that's going to generate the most noise in early June. It's specific, reproducible, and already showing up in developer forums — which means it's not an edge case. Visual Studio Magazine documented one instance bluntly: a developer mid-way through a complicated agentic project in VS Code unknowingly hit their monthly premium request quota, triggering an automatic fallback to a less capable model — which promptly broke what they were building. That fallback at least kept them working. After June 1st, the fallback goes away entirely. Add more credits or don't use Copilot for the rest of the month.

That's the documented version. Here's the pattern underneath it that'll catch more people off guard:

You have a complicated section of code. You ask Copilot — on a lower-tier model, because you're trying to be responsible — to work through it. The model struggles. It rabbit-holes. It goes on a tangent, comes back wrong, tries again, generates output that's almost right but not quite, and the session turns into fifteen exchanges of "no, not like that." Each exchange costs tokens. Input and output, both. The context window is growing with every failed attempt.

Eventually you escalate to the premium model. Which handles it in two exchanges. Just like the cheaper model did three weeks ago, before something quietly changed.

Under the old flat billing: annoying. Maybe three premium requests burned on an adventure that should have been one.

Under AI Credits: you paid for every failed tangent at the lower model's rate, then you paid the full premium model rate on top, and the session that "should" have cost one unit of whatever ended up costing — depending on context size, model choices, and how badly the rabbit hole went — potentially an order of magnitude more.

The billing model change doesn't create this failure mode. It monetizes one that already exists.

On the Subject of Models Getting Dumber

This is where it gets uncomfortable to talk about, because it requires saying something that sounds conspiratorial and is nonetheless probably true.

Older models degrade. Science fiction called this rampancy and treated it as tragedy — an AI grown too complex, turning on itself, becoming unpredictable until it burned out. The 2026 version is less dramatic: a cost optimization that doesn't make the changelog. Same user experience, tidier paperwork. Not because the weights change — they don't, that's not how deployed models work — but because what you're actually calling when you ask for "gpt-4o" or a specific Copilot model is not a static artifact. It's a routing decision that terminates somewhere in a serving infrastructure that gets updated continuously. Quantization. Distillation. Safety layer adjustments. System prompt tweaks. Cheaper inference paths under load. None of this is disclosed. None of it changes the model name on the dropdown.

Research on the question tends to conclude that most "the model got dumber" complaints don't survive rigorous measurement. That's technically accurate and largely irrelevant to what working developers are actually experiencing.

Here's a more grounded version of the observation: if you run the same task — genuinely the same class of problem, fresh context window, not a polluted session — on a model at launch and then six weeks later, and the results are materially different in ways that consistently require escalation to a tier up, that's a signal. It may not be "they nerfed the model." It might be quantization for cost efficiency. It might be a safety filter that now catches patterns it didn't before. It might be inference-time optimizations that shave compute at the cost of edge-case reliability.

The mechanism doesn't particularly matter. The outcome is: tasks that a cheaper tier handled cleanly now require a more expensive tier. And starting June 1st, that escalation path has a per-token price tag.

You don't have to believe anyone is doing this deliberately to recognize that the incentive structure doesn't reward fixing it. A cheaper model that occasionally fails and causes users to escalate to a premium model is, under usage-based billing, a revenue event. Not a bug report.

The Incentive Problem Nobody Names Directly

Microsoft is already public, so the classic pre-IPO margin squeeze doesn't directly apply here — but the underlying incentive structure is the same thing, just wearing different clothes. Quarterly earnings pressure replaces roadshow pressure. Analysts want gross margin improvement, ARR growth, compute efficiency stories. The questions nobody is asking: whether the model you shipped six months ago still handles the same edge cases it did at launch, or whether the cheap tier that drove adoption is still as capable as it was when you built your workflow on it.

What does get asked: compute cost per query. Infrastructure optimization ratios. The conversion of flat subscription revenue into potentially uncapped per-token billing. That last one is what Copilot's June 1st transition is, expressed in plain terms. Heavy users are now directly accountable for their usage costs. The economics of frontier model sessions shift from GitHub's balance sheet to yours.

It's worth naming that this dynamic is most acute not at Microsoft but at the companies supplying the models behind Copilot — most of which are pre-IPO or recently listed, burning cash, and under significant pressure to demonstrate a path to margin. The model providers have every incentive to push token consumption up. The incentive to keep cheaper tiers performing at launch quality is correspondingly weak. Microsoft sits at the distribution layer; the underlying pressure runs upstream.

What makes the GitHub version particularly pointed is the data collection policy announced the same week. They're billing you more precisely while simultaneously taking more of what you produce. Both changes landed in April 2026. One is about money. One is about data. They're not coincidentally timed.

(Speculative aside, since nobody has confirmed the connection explicitly: interesting timing that OpenAI is currently seeing other people. The MS/OpenAI deal was just restructured — Microsoft's license goes non-exclusive, the revenue share arrangements get capped and simplified, and OpenAI is now free to cozy up to Oracle, AWS, whoever. For a long time Copilot defaulted to OpenAI models, presumably at rates baked into a very unusual financial arrangement involving a lot of Azure compute credits rather than actual money. As that relationship becomes more arm's-length and commercially normal, "what does it actually cost us to route to GPT-whatever" becomes a real question in a way it maybe wasn't before. Could be a contributing factor to why the unit economics suddenly needed fixing. Could be coincidence. I've written about the broader financial strangeness of the AI funding era elsewhere — but the timing is hard to completely ignore.)

The Practical Reality for Individual Developers

First: the thing most coverage is glossing over. Code completions — the bread and butter autocomplete that most people think of when they think "Copilot" — are not metered. They stay unlimited on paid plans and don't consume AI Credits at all. If that's primarily how you use Copilot, June 1st looks a lot like today.

What gets metered is everything else: chat, agent mode, code review, any session where you're actually talking to the model rather than accepting inline suggestions. That's where the usage-based math kicks in, and that's where workflows get expensive fast.

If you're in agent mode regularly — "fix this," "refactor that," "figure out why this isn't working" — the math changes significantly. Agent sessions fan out. One prompt becomes a dozen model calls with growing context. Under per-token billing, a session that felt like one request is priced like what it actually was: a lot of tokens.

A few things worth knowing before June 1st:

Overages require opt-in. If you exhaust your included AI Credits, you don't automatically get billed — you hit a wall. To spend beyond your allotment, you have to explicitly set an additional usage budget. Without one, you stop. That's actually better than the alternative, but it means you'll need to decide in advance whether you want overage capability or a hard ceiling. Set it intentionally rather than discovering the hard way mid-session.

Watch the in-editor warnings. VS Code and the Copilot CLI are getting usage progress tracking ahead of June 1st. Check whether it's live in your setup before the billing switch flips.

Start fresh sessions more aggressively. Context rot is real — models perform worse as conversation history grows — and under per-token billing, a degraded long session costs more in a way it didn't before. Short, focused sessions cost less and surface model quality issues faster than a twenty-turn marathon where you can't tell if the model is struggling or you're just explaining the problem badly.

Take the "right-size your model" advice with appropriate skepticism. The advice is structurally correct — cheaper models for routine tasks, premium models for hard problems. The practical challenge is that "what this model can handle" has been shifting, and the feedback loop for noticing you've over-trusted a lower tier now has a dollar amount attached. Keep your own informal benchmarks. The task you ran cleanly two weeks ago is the only honest calibration you have.

What The Phase 3 Arc Actually Looks Like

In the previous post I laid out the timeline: 2018 acquisition with independence promises, 2025 absorption into CoreAI, 2026 data policy flip. The billing change fits into the same sequence.

Phase 1 was adoption. Make the tool essential. Free tiers, generous limits, flat pricing that removes friction. Build the muscle memory. Get developers to the point where they don't know how they shipped code before this.

Phase 2 was integration. Deepen the dependency. IDE plugins, agent mode, code review, CLI. Make the switching cost real. Not by locking you in artificially but by genuinely becoming load-bearing infrastructure in daily workflows.

Phase 3 is extraction. Not crudely — there's still genuine value being delivered — but the pricing structure that made adoption frictionless is being replaced by one that captures more of the value from the usage patterns you built during phases one and two.

The billing change is Phase 3 expressed through pricing. The data policy is Phase 3 expressed through terms of service. The absorption into CoreAI was Phase 3 expressed through org structure. They're not independent moves; they're the same move in different domains.

Nadella said judge them by their actions. Still good advice. The actions are getting easier to read.

The opt-out for the data training policy is still at GitHub Settings → Copilot → Privacy. Do it for every account. The spend controls for June 1st are in the same neighborhood — find them before the billing cycle starts.

Find me on Mastodon at @ppb1701@ppb.social