How to Calculate AI Agent Costs (Multi-Step, Tool Calls, Retries)
Agents pay for loops, not single completions. The formula: steps x calls x tokens x retry rate, with worked GPT-5 examples and the multipliers public calculators miss.
To calculate an AI agent’s cost, price the loop, not the call: cost per task = steps x calls per step x cost per call x (1 + retry rate). A production agent makes 15 to 50 model calls per task, so a workload that looks like $0.015 per call is really $0.30 to $0.75 per task. Public cost calculators model a single completion, which is why agent budgets blow through their estimates by an order of magnitude.
The agent cost formula
Two lines cover it:
cost per call = (input tokens x input price + output tokens x output price) / 1,000,000
cost per task = steps x calls per step x cost per call x (1 + retry rate)
The terms, defined the way they show up in logs:
- Steps: distinct phases of the task (plan, search, read, act, verify, revise).
- Calls per step: most steps take more than one model call, because the model first requests a tool and then makes a second call to digest the tool’s result.
- Retry rate: the share of calls repeated after malformed JSON, failed validations, or tool errors. In logs we see 10 to 25% on real agents; almost nobody budgets for it.
Prices as of June 2026 (per million tokens, live numbers at openai.com/api/pricing): GPT-5.5 $5 input / $30 output, GPT-5 $1.25 / $10, GPT-5 Mini $0.25 / $2.
Why single-completion calculators get agents wrong
Three structural reasons, in rising order of sneakiness:
- The loop multiplies everything. One “task” in your product is one row in the calculator and 20 rows in your OpenAI usage log.
- Context accumulates. Call 12 carries the transcript of calls 1 through 11 as input. Average input per call ends up several times the prompt you wrote. A workable planning rule: average input is roughly the system prompt plus half the final transcript length.
- Tool results are input tokens. A web page, a SQL result, a file diff: the agent pays input rates to read everything its tools return.
A single agent task is not one API call; it is fifteen to fifty. That sentence is the whole correction to apply to any calculator output.
A worked example: a research-and-write agent
Reference spec, drawn from a common shape (research a topic, draft, self-review):
- 8 steps, 2 calls per step: 16 planned calls
- Average 6,000 input / 800 output tokens per call
- 15% retry rate: 16 x 1.15 = 18.4 effective calls
On GPT-5 ($1.25 / $10 per million):
cost per call = 6,000/1M x $1.25 + 800/1M x $10 = $0.0075 + $0.0080 = $0.0155
cost per task = 18.4 x $0.0155 = $0.285
1,000 tasks/month = $285
The same agent across three model choices:
| Model | Cost per call | Cost per task (18.4 calls) | 1,000 tasks/month |
|---|---|---|---|
| GPT-5 Mini | $0.0031 | $0.057 | $57 |
| GPT-5 | $0.0155 | $0.285 | $285 |
| GPT-5.5 | $0.0540 | $0.994 | $994 |
A 17x spread from model choice alone, before you touch the loop. Trimming one step out of eight saves more than most prompt-compression projects.
The multipliers people forget
- Retries. Measure your real rate from logs and put it in the formula. At 25% instead of 15%, the GPT-5 task above costs $0.31, and a month of 10,000 tasks drifts $250 over plan.
- Reasoning tokens bill as output. Reasoning models think in tokens you pay for but never see. If your agent uses one for planning steps, budget output tokens well above the visible answer length. The mechanics are in reasoning tokens: the hidden multiplier.
- Caching helps the repeated prefix. GPT-5 cached input bills at $0.125 per million, a 90% discount. If 2,000 of our 6,000 input tokens are a stable system-plus-tools prefix, the task drops from $0.285 to about $0.244, roughly 15% off, for free.
- Context truncation is a cost control. Summarizing history every few steps caps the input growth curve, which matters more than any per-call optimization once tasks run long.
From per-task cost to a monthly budget
Multiply per-task cost by volume, then add a 20 to 30% buffer for retries and context drift. Our reference agent at 1,000 tasks per day is $285 x 30 = $8,550 a month on GPT-5 before the buffer.
The structural problem is that every lever above optimizes a meter that still scales with agent ambition: more steps, more tools, more checks all cost linearly. Usage-window pricing inverts that, because a flat plan charges nothing per step. The meter charges for every step an agent takes; a usage window charges for none of them individually. Why that flips the economics for agent workloads specifically is covered in why agent workloads flip the API-vs-subscription math, and the always-on version of this arithmetic is worked through in what a 24/7 AI agent actually costs.
For estimating token counts from words and documents before you have logs, start with the calculator walkthrough.
Run your own agent shape through the calculator: it does the steps-times-calls math and maps the result to the cheapest covering setup.
Frequently asked questions
How do you calculate the cost of an AI agent?
Cost per task = steps x calls per step x cost per call x (1 + retry rate), where cost per call is input tokens times the input price plus output tokens times the output price. A typical production agent makes 15 to 50 model calls per task, so per-task cost is usually 15 to 50 times what a single-completion calculator shows.
How many API calls does one agent task make?
Most production agents make 15 to 50 model calls per task once you count planning, tool-call rounds, digesting tool results, validation, and retries. A simple 8-step agent with 2 calls per step and a 15% retry rate already lands at about 18 effective calls.
Why do AI cost calculators underestimate agent costs?
Public calculators price one completion: one input, one output. Agents loop, and each later call carries the accumulated transcript of earlier calls as input, so both call count and average input size grow. Retries add another 10 to 25% on top.
How much does one agent task cost on GPT-5?
Our reference agent (8 steps, 18.4 effective calls, 6,000 input and 800 output tokens per call) costs about $0.29 per task at GPT-5's June 2026 rates of $1.25/$10 per million tokens. The same task runs about $0.06 on GPT-5 Mini and about $0.99 on GPT-5.5.