The why for Usage Based Billing for GitHub Copilot
Last month GitHub announced Usage Based Billing for GitHub Copilot. This completely changes the way its costs are calculated. A lot has already been written about the impact and a lot is still unknown.
Last month GitHub announced Usage Based Billing for GitHub Copilot. This completely changes the way its costs are calculated. A lot has already been written about the impact and a lot is still unknown.
In this post I want to explore the reasons why this switch was inevitable and how it enables new scenarios to make Copilot perform better than it did before.
Quick recap of copilots current pricing model
GitHub Copilot is currently available in 5 different plans:
- GitHub Copilot Free - severely rate limited free version with no option to buy additional requests.
- GitHub Copilot Pro - Individual license with limited Premium Request Units.
- GitHub Copilot Pro+ - Individual license with more extensive Premium Request Units.
- GitHub Copilot for Business - License for organizations with centralized policies and limited Premium Request Units.
- GitHub Copilot Enterprise - License for organizations with centralized policies and more extensive Premium Request Units.
Each of these licenses has a fixed monthly price. GitHub Copilot includes unlimited suggestions and a limited set of free models.
And about a year ago, GitHub introduced the concept of Premium Request Units to pay for more powerful models and and PRUs are the used to charge for specific features like the Cloud Coding Agent and Copilot Code Review.
When the Cloud Coding Agent was introduced, it could use more than 1 PRU per request. This made the cost of the agent unpredictable and GitHub soon changed the price of the Cloud Coding Agent to a single PRU per request.
When a user has exhausted their included PRU budget, additional PRUs are charged. Additional PRUs have a fixed cost.
Usages of PRUs
The primary use of PRUs are to charge the costs of requests to the users of GitHub Copilot. Each model is assigned a multiplier of 0, 1, up to 50 (highest so far) PRUs.
The model multiplier wasn't just used to charge the cost, it's an open secret that it was also used to divert users to other models to load balance them. Most people are more hesitant to use expensive models carelessly. The introduction of the "Auto" mode rests on this principle. Users get a 10% discount on PRUs if they let GitHub select their model. While most users seem to assume this will route them to the "best model for the task", this generally sends the request to the "model with the best availability".
Premium Request Units were also reduced for promotional purposes. When Claude Opus 4.7 was introduced, it's multiplier was temporarily set to 3x, later increased to 15x. This promotional use isn't just to get us "hooked" on the more powerful model features, it's also to allow GitHub to get crucial statistics on how these new models perform and to help them iron out bugs.
How GitHub pays for model usage
The model providers, Anthropic, Google, OpenAI and X, don't charge the cost of their models in Premium Request Units. They charge the cost in terms of Tokens. Input Tokens, Output tokens and Cached tokens at the highest level.
Given the amount GitHub uses, they have probably got a really good deal.
But GitHub needs to balance the cost for the Tokens it spends against what it can charge its users in Premium Request Units.
GitHub also balances the costs between Tokens and PRUs by limiting the maximum token window for models. For example, if you're using Claude Opus 4.6 through GitHub Copilot, the maximum token window is 200k tokens. If you use the same model directly through Anthropic you have a maximum of 1m tokens.
As Agent Mode and Cloud Coding Agent got more powerful, more tools appeared, the models are using more and more tokens for every request, reducing the available token window for your requests.
Other ways GitHub control costs
GitHub has a few other options to control the maximum number of tokens you can use for each Premium Request Unit they charge you.
Agent Mode - Agent Mode spends tokens. GitHub can control the token spend by forcing the agent to come to an early conclusion or to postpone work to a future session.
Cloud Coding Agent - The cloud coding agent spends tokens and action minutes. GitHub can control the token spend by forcing the agent to come to an early conclusion or to postpone work to a future session. The workflows running the Cloud Coding Agent runs on is limited to 2 hours.
Code Review - Similar to the Cloud Coding Agent, when using GitHub's Code Review feature, GitHub can control the token spend by limiting the total time and the total of tokens the feature can spend.
When using these features, GitHub can also control which model is used to perform these tasks.
Some features, like tool optimization in Visual Studio Code also limit the total number of tokens used for each session by default.
Recent creative uses
New features, like subagent and fleets have multiplied the number of tokens you can use in a single Agent request. Visual Studio Code extensions have allowed users to steer the agent without completing the Agent request.
These creative usages, some people might call them abuse, have allowed some people to use significantly more tokens costs than the PRU is worth. In some cases thousands time more.
And instead of figuring out how to charge its users in Premium Request Units in this new model, GitHub will charge us for the actual tokens instead.
So what changes
On June 1st 2026 GitHub will change the way we all pay for our AI usage.
Instead of the abstract thing, called Premium Request Units, acting as a proxy for the real cost of AI usage, we're going to be charged in tokens, the same unit in which GitHub is charged.
Unfortunately, the different model providers, each charge different prices for different kinds of tokens. Your usage of each model is translated from the price per token into AI Credits, the credits are deducted from the budget included in your account.
GitHub explains how the new pricing model works on the GitHub Blog.

The benefits
The introduction of tokens isn't meant to simply increase the cost of AI, but it is also expected to more equally spread the load across the available models as users will be more likely to try to find the most cost effective model to use for the work they're trying to accomplish, instead of using most powerful model available to them for all their tasks. This should result in better model availability and higher performance.
A key benefit of GitHub Copilot is that you get access to the models of Anthropic, Google and OpenAI through a single subscription. And now that we are charged for the actual usage of tokens, GitHub can start raising the tokens available for each session, making the models more comparable to subscribing directly through each model provider.
Additional changes for Business accounts
In addition to the new pricing, GitHub is introducing a number of welcome changes for business accounts (Copilot for Business and Copilot Enterprise).
Pooled usage - Under the current model, when a user has spent all of its budget, additional requests are automatically charged. Under the new model, the budget of all users is pooled together. So the unused budget from less frequent users can be consumed by more frequent users in the same organization. Only when the full pooled budget is used up, will additional usage be charged.
Individual additional budgets - Under the current model, additional charges can be controlled through budgets, these are pooled together for all users in the same cost center. In the new model it's possible to set a per-user budget as well as a universal budget for all users in addition to the already existing budget control options.
In addition to these changes, existing policy options to control costs are still available:
Enterprise, Organization and Cost Center Budgets - Existing budget options to limit the total spend at the Enterprise, Organization and Cost Center can still be configured to limit the total spend on AI costs.
Limit model availability - Enterprise and Organization level policies allow admins to control the available models. This can remove less effective and more expensive model options.
Limit feature availability - Enterprise and Organization level policies allow admins to control the available features. This can remove more expensive features like Code Review and Cloud Coding Agent for specific users.
Don't hamstring your users, educate them
But, ultimately, limiting the cost isn't the path forward. Optimizing the value delivered through AI is. Use the available limits to bridge the time it takes to educate your users.