The OpenAI Batch API: When 50% Off Is Worth the Wait
The Batch API takes 50 percent off OpenAI token prices for jobs returned within 24 hours. How it works, which workloads qualify, and when a flat subscription lane beats it.
The OpenAI Batch API bills input and output tokens at 50 percent off standard rates in exchange for one concession: results arrive asynchronously, any time within a 24-hour window. Same models, same answers, half the price, as of June 2026 (openai.com/api/pricing). The discount fits work that can wait overnight: summaries, backfills, evals, content queues. It excludes anything interactive, including agent loops, where each step depends on the last. This page covers the mechanics, the arithmetic, the fit test, and how batch compares to a flat subscription lane for the same jobs.
How the Batch API works
A batch is a JSONL file: one request per line, each tagged with a custom_id you later use to match results back.
{"custom_id": "doc-0001", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-5", "messages": [{"role": "system", "content": "Summarize in 5 bullets."}, {"role": "user", "content": "<document text>"}]}}
{"custom_id": "doc-0002", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-5", "messages": ["..."]}}
Upload the file, create the batch, poll, download:
from openai import OpenAI
client = OpenAI()
f = client.files.create(file=open("requests.jsonl", "rb"), purpose="batch")
batch = client.batches.create(
input_file_id=f.id,
endpoint="/v1/chat/completions",
completion_window="24h",
)
# later
batch = client.batches.retrieve(batch.id) # validating -> in_progress -> completed
results = client.files.content(batch.output_file_id)
Three operational facts matter before the first run. Batch jobs draw on a separate quota pool, so a large job does not starve your synchronous traffic. The 24-hour window is a ceiling, not a schedule: results often land far sooner, but nothing sooner is promised. And requests that fail or expire are not billed; you requeue them from the error file.
The discount, in numbers
Batch halves both sides of the meter. June 2026 rates, live source at openai.com/api/pricing:
| Model | Input /1M | Batch input /1M | Output /1M | Batch output /1M |
|---|---|---|---|---|
| GPT-5.5 | $5.00 | $2.50 | $30.00 | $15.00 |
| GPT-5 | $1.25 | $0.625 | $10.00 | $5.00 |
| GPT-5 Mini | $0.25 | $0.125 | $2.00 | $1.00 |
| GPT-5 Nano | $0.05 | $0.025 | $0.40 | $0.20 |
The Batch API is the one OpenAI discount that asks for no engineering: no prompt restructuring, no model downgrade, just patience. Prompt caching, by contrast, discounts repeated prefixes on synchronous traffic; treat the two as separate levers for separate traffic. Caching mechanics are in prompt caching explained, and both levers sit in the full rundown in how to reduce OpenAI API costs.
A worked job: 100,000 document summaries
Specs: 100,000 documents, 3,000 input tokens and 300 output tokens each, on GPT-5.
Synchronous:
input 300M × $1.25 = $375.00
output 30M × $10.00 = $300.00
total $675.00
Batch:
input 300M × $0.625 = $187.50
output 30M × $5.00 = $150.00
total $337.50
One run saves $337.50 for identical output. Run it weekly and the monthly line drops from $2,700 to $1,350.
Which workloads fit
| Workload | Batch fit | Why |
|---|---|---|
| Nightly digests and reports | yes | the deadline is tomorrow morning |
| Eval suites and regression runs | yes | nobody watches a spinner |
| Classification and enrichment backfills | yes | pure parallel volume |
| Embedding and index refreshes | yes | offline by nature |
| Chatbots and copilots | no | users wait seconds, not hours |
| Agent loops | no | step N+1 needs step N’s answer |
| Anything with a same-minute SLA | no | the window is 24 hours |
The agent row deserves the emphasis: an agent cannot batch its own next step. You can batch a thousand independent tasks; you cannot batch the inside of one loop. Always-on agent economics get their own 30-day treatment in what a 24/7 AI agent actually costs.
The fine print
The pipeline is yours to build: split requests into files within OpenAI’s per-batch caps, submit, poll, parse the output file, match custom_ids, requeue failures. None of it is hard; all of it is code you now own and monitor.
Plan around the full window. Test batches that return in twenty minutes train optimistic assumptions, and production will eventually use all 24 hours on the day a deadline depends on it. Downstream steps should trigger on batch completion, not on a clock.
Model lifecycles still apply. Batch requests name a model, so deprecations and price changes reach batch pipelines on the same calendar as everything else.
Batch vs the subscription lane for the same jobs
Batch-shaped work, bulk and asynchronous with no need for streaming, is also exactly the shape a subscription lane serves well, since the Codex lane returns complete responses. The distinction is simple: batch halves the meter; a subscription lane replaces it.
Take the weekly summary job above at $1,350 a month after the batch discount. The same volume sits inside our Pro 5x capacity estimate of roughly $3,500 API-equivalent work a month: $100 for the ChatGPT plan plus our $129 fee is $229 all-in. So does the undiscounted $2,700 version. Capacity figures are estimates, never guarantees, and the crossover arithmetic is worked through in OpenAI API vs ChatGPT subscription cost.
The honest split: a one-off backfill or an occasional burst belongs on the Batch API, official and self-contained. A standing monthly volume of batchable work belongs on a flat lane, where the bill stops tracking volume. The two compose: flat lane for the base load, batch on your API key for overflow above the plan windows.
Run your own job through the calculator; it prices the metered bill and shows what the flat setup costs to cover it.
Frequently asked questions
How much does the OpenAI Batch API save?
50 percent off both input and output tokens versus synchronous rates, as of June 2026. GPT-5 drops from $1.25 to $0.625 per million input tokens and from $10 to $5 per million output. Models and output quality are identical; the price of the discount is waiting up to 24 hours for results.
How long does a Batch API job take?
OpenAI's completion window is 24 hours. Results can arrive any time inside it, often within minutes or hours, but the window is the only promise. Requests that expire unprocessed are not billed and can be resubmitted. Treat batch as overnight infrastructure, not a slightly slower API.
Which workloads fit the Batch API?
Asynchronous, parallel work: nightly summaries, eval suites, classification and enrichment backfills, embedding refreshes, content queues, moderation passes. Interactive traffic does not fit, and neither do agent loops, because each agent step depends on the previous step's answer and cannot sit in a queue for hours.
Is the Batch API cheaper than a ChatGPT subscription lane?
They fit different shapes. Batch halves the meter and is the cleanest discount for occasional bulk jobs, with no extra vendor involved. A subscription-backed lane replaces the meter: we estimate ChatGPT Pro 5x absorbs roughly $3,500 of API-equivalent work a month for about $229 all-in through a hosted setup, an estimate rather than a guarantee. Standing monthly volume favors the flat lane; one-off bursts favor batch.