Python OpenAI SDK: One base_url Change for Flat-Rate Calls
How to point the OpenAI Python SDK at ProxyLLM with base_url. Batch scripts and notebooks bill to a flat ChatGPT subscription instead of the per-token meter.
The official OpenAI Python SDK takes a base_url argument, and that argument is the entire integration. Point it at https://api.proxyllm.ai/v1 with a ProxyLLM key, and your script’s OpenAI calls run through Codex Hosted on your own ChatGPT subscription: same request shape, same response objects, same exceptions. What changes is the bill. Flat plan capacity replaces the per-token meter.
The one-line change
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.proxyllm.ai/v1",
api_key=os.environ["PROXYLLM_API_KEY"],
)
result = client.chat.completions.create(
model="gpt-5-mini",
messages=[{"role": "user", "content": "Extract the action items from this thread."}],
)
print(result.choices[0].message.content)
The key comes from the ProxyLLM dashboard. Create one per script or service, and give unattended callers a budget cap so a bug in a loop has a ceiling. Connecting your ChatGPT account by device code, starting the container, and sending a first request is covered in the setup guide.
If you prefer configuration over code, the SDK reads two environment variables, which means an existing codebase can switch endpoints without an edit:
export OPENAI_BASE_URL="https://api.proxyllm.ai/v1"
export OPENAI_API_KEY="pk_live_your_key"
OpenAI() with no arguments picks both up. Rollback is equally boring: unset the variables and you are back on api.openai.com.
A batch script built for the flat lane
Batch jobs are where the meter hurts most: the same prompt against thousands of inputs, every night, forever. Here is a nightly summarizer that survives crashes and reruns cleanly.
import os
import pathlib
from openai import OpenAI
client = OpenAI(
base_url="https://api.proxyllm.ai/v1",
api_key=os.environ["PROXYLLM_API_KEY"],
max_retries=3, # SDK-native exponential backoff
)
def summarize(text: str) -> str:
result = client.chat.completions.create(
model="gpt-5",
messages=[
{"role": "system", "content": "Summarize in five bullet points."},
{"role": "user", "content": text},
],
)
return result.choices[0].message.content
for path in sorted(pathlib.Path("inbox").glob("*.txt")):
out = pathlib.Path("summaries") / path.name
if out.exists():
continue # reruns skip finished work
out.write_text(summarize(path.read_text()))
print(f"done: {path.name}")
Three details earn their place. max_retries=3 keeps transient failures inside the SDK’s own backoff instead of your code. The out.exists() check makes reruns idempotent, so a crash at document 1,400 costs you nothing on restart. And there is no streaming logic to write, because the Codex lane returns complete responses, which is the natural shape for a batch job anyway.
What the nightly run costs, both ways
Take 2,000 documents a night at roughly 3,000 input and 300 output tokens each, on gpt-5. OpenAI’s June 2026 list prices put gpt-5 at $1.25 per million input tokens and $10 per million output tokens.
| Line | Arithmetic | Monthly |
|---|---|---|
| Input | 2,000 × 3,000 × 30 = 180M × $1.25 | $225 |
| Output | 2,000 × 300 × 30 = 18M × $10 | $180 |
| API total | $405 | |
| Flat setup | ChatGPT Plus $20 + ProxyLLM $129 | $149 |
A nightly 2,000-document batch costs about $405 a month at GPT-5 list prices and about $149 against a Plus plan. The Plus window absorbs an estimated $700 of API-equivalent work per month (an estimate from observed usage windows, never a guarantee), so this job fits with room to spare. Heavier jobs step up to Pro tiers; the full crossover math is in the API vs subscription comparison.
When a plan window does run out mid-batch, requests fall back to a second connected account if you have one, then to your own OpenAI API key, and the request log records which lane served each call. The script never notices. It keeps iterating.
Notebooks and unattended workers
Notebook work burns tokens in a specific way: you re-run the same cell twenty times while tuning a prompt. On the meter, iteration has a price per keystroke. On a flat lane it consumes window capacity that resets on schedule whether you used it or not.
Give notebooks their own key so experiments never share a budget with production scripts. Workers (Celery, RQ, cron) get the same treatment: one key each, a cap sized to the job, per-key request logs. The dashboard shows the API-equivalent value each caller consumed, which is how you notice the cron job that quietly doubled.
Honest caveats
- Complete responses on the Codex lane. No token-by-token stream. If you ship a terminal chat UI with a typing effect, keep that caller on the API-key lane, which streams normally.
- Model surface. You get the models Codex serves. Embeddings, fine-tunes, and exotic parameters stay on your own API key.
- Policy posture. Programmatic Codex use is documented OpenAI functionality, your account is never shared or pooled, and OpenAI keeps the final call over its own services.
The same one-argument pattern exists in JavaScript; the Node.js guide covers the env-var and serverless angles. The condensed Python steps live on the Python integration page.
If a Python job is a visible line on your OpenAI invoice, the arithmetic takes thirty seconds: put the monthly number into the calculator and read what the same work costs flat.
Frequently asked questions
How do I set a custom base URL in the OpenAI Python SDK?
Pass base_url when constructing the client, for example OpenAI(base_url='https://api.proxyllm.ai/v1', api_key=...). Every request then goes to that host instead of api.openai.com. The SDK also reads the OPENAI_BASE_URL environment variable, so you can switch endpoints without touching code.
Can a Python script use a ChatGPT subscription instead of API billing?
Not directly, because ChatGPT plans ship no API key. The bridge is Codex, which is included in ChatGPT plans and runs programmatically. ProxyLLM's Codex Hosted wraps that in an OpenAI-compatible endpoint, so a Python script pointed at it bills against the flat plan instead of per-token rates.
Does streaming work when base_url points at ProxyLLM?
On the Codex Hosted lane, responses arrive complete rather than as a token stream. Requests served by API-key lanes stream normally. Batch scripts, notebooks, and workers rarely need streaming, which is why they are the natural fit for the flat lane.
Do retries and error handling change with a custom base URL?
No. The official SDK keeps its built-in retry and backoff behavior, and errors keep the OpenAI shape, so existing try/except blocks work unchanged. ProxyLLM adds a request log showing every call, the lane that served it, and its API-equivalent value.