← Blog OpenAI costs June 12, 2026

The OpenAI Batch API: When 50% Off Is Worth the Wait

The Batch API takes 50 percent off OpenAI token prices for jobs returned within 24 hours. How it works, which workloads qualify, and when a flat subscription lane beats it.

The OpenAI Batch API bills input and output tokens at 50 percent off standard rates in exchange for one concession: results arrive asynchronously, any time within a 24-hour window. Same models, same answers, half the price, as of June 2026 (openai.com/api/pricing). The discount fits work that can wait overnight: summaries, backfills, evals, content queues. It excludes anything interactive, including agent loops, where each step depends on the last. This page covers the mechanics, the arithmetic, the fit test, and how batch compares to a flat subscription lane for the same jobs.

How the Batch API works

A batch is a JSONL file: one request per line, each tagged with a custom_id you later use to match results back.

{"custom_id": "doc-0001", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-5", "messages": [{"role": "system", "content": "Summarize in 5 bullets."}, {"role": "user", "content": "<document text>"}]}}
{"custom_id": "doc-0002", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-5", "messages": ["..."]}}

Upload the file, create the batch, poll, download:

from openai import OpenAI
client = OpenAI()

f = client.files.create(file=open("requests.jsonl", "rb"), purpose="batch")
batch = client.batches.create(
    input_file_id=f.id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
)

# later
batch = client.batches.retrieve(batch.id)  # validating -> in_progress -> completed
results = client.files.content(batch.output_file_id)

Three operational facts matter before the first run. Batch jobs draw on a separate quota pool, so a large job does not starve your synchronous traffic. The 24-hour window is a ceiling, not a schedule: results often land far sooner, but nothing sooner is promised. And requests that fail or expire are not billed; you requeue them from the error file.

The discount, in numbers

Batch halves both sides of the meter. June 2026 rates, live source at openai.com/api/pricing:

Model	Input /1M	Batch input /1M	Output /1M	Batch output /1M
GPT-5.5	$5.00	$2.50	$30.00	$15.00
GPT-5	$1.25	$0.625	$10.00	$5.00
GPT-5 Mini	$0.25	$0.125	$2.00	$1.00
GPT-5 Nano	$0.05	$0.025	$0.40	$0.20

The Batch API is the one OpenAI discount that asks for no engineering: no prompt restructuring, no model downgrade, just patience. Prompt caching, by contrast, discounts repeated prefixes on synchronous traffic; treat the two as separate levers for separate traffic. Caching mechanics are in prompt caching explained, and both levers sit in the full rundown in how to reduce OpenAI API costs.

A worked job: 100,000 document summaries

Specs: 100,000 documents, 3,000 input tokens and 300 output tokens each, on GPT-5.

Synchronous:
  input   300M × $1.25  = $375.00
  output   30M × $10.00 = $300.00
  total                   $675.00

Batch:
  input   300M × $0.625 = $187.50
  output   30M × $5.00  = $150.00
  total                   $337.50

One run saves $337.50 for identical output. Run it weekly and the monthly line drops from $2,700 to $1,350.

Which workloads fit

Workload	Batch fit	Why
Nightly digests and reports	yes	the deadline is tomorrow morning
Eval suites and regression runs	yes	nobody watches a spinner
Classification and enrichment backfills	yes	pure parallel volume
Embedding and index refreshes	yes	offline by nature
Chatbots and copilots	no	users wait seconds, not hours
Agent loops	no	step N+1 needs step N’s answer
Anything with a same-minute SLA	no	the window is 24 hours

The agent row deserves the emphasis: an agent cannot batch its own next step. You can batch a thousand independent tasks; you cannot batch the inside of one loop. Always-on agent economics get their own 30-day treatment in what a 24/7 AI agent actually costs.

The fine print

The pipeline is yours to build: split requests into files within OpenAI’s per-batch caps, submit, poll, parse the output file, match custom_ids, requeue failures. None of it is hard; all of it is code you now own and monitor.

Plan around the full window. Test batches that return in twenty minutes train optimistic assumptions, and production will eventually use all 24 hours on the day a deadline depends on it. Downstream steps should trigger on batch completion, not on a clock.

Model lifecycles still apply. Batch requests name a model, so deprecations and price changes reach batch pipelines on the same calendar as everything else.

Batch vs the subscription lane for the same jobs

Batch-shaped work, bulk and asynchronous with no need for streaming, is also exactly the shape a subscription lane serves well, since the Codex lane returns complete responses. The distinction is simple: batch halves the meter; a subscription lane replaces it.

Take the weekly summary job above at $1,350 a month after the batch discount. The same volume sits inside our Pro 5x capacity estimate of roughly $3,500 API-equivalent work a month: $100 for the ChatGPT plan plus our $129 fee is $229 all-in. So does the undiscounted $2,700 version. Capacity figures are estimates, never guarantees, and the crossover arithmetic is worked through in OpenAI API vs ChatGPT subscription cost.

The honest split: a one-off backfill or an occasional burst belongs on the Batch API, official and self-contained. A standing monthly volume of batchable work belongs on a flat lane, where the bill stops tracking volume. The two compose: flat lane for the base load, batch on your API key for overflow above the plan windows.

Run your own job through the calculator; it prices the metered bill and shows what the flat setup costs to cover it.

Frequently asked questions

How much does the OpenAI Batch API save?

50 percent off both input and output tokens versus synchronous rates, as of June 2026. GPT-5 drops from $1.25 to $0.625 per million input tokens and from $10 to $5 per million output. Models and output quality are identical; the price of the discount is waiting up to 24 hours for results.

How long does a Batch API job take?

OpenAI's completion window is 24 hours. Results can arrive any time inside it, often within minutes or hours, but the window is the only promise. Requests that expire unprocessed are not billed and can be resubmitted. Treat batch as overnight infrastructure, not a slightly slower API.

Which workloads fit the Batch API?

Asynchronous, parallel work: nightly summaries, eval suites, classification and enrichment backfills, embedding refreshes, content queues, moderation passes. Interactive traffic does not fit, and neither do agent loops, because each agent step depends on the previous step's answer and cannot sit in a queue for hours.

Is the Batch API cheaper than a ChatGPT subscription lane?

They fit different shapes. Batch halves the meter and is the cleanest discount for occasional bulk jobs, with no extra vendor involved. A subscription-backed lane replaces the meter: we estimate ChatGPT Pro 5x absorbs roughly $3,500 of API-equivalent work a month for about $229 all-in through a hosted setup, an estimate rather than a guarantee. Standing monthly volume favors the flat lane; one-off bursts favor batch.