LangChain on a Flat-Rate OpenAI Lane
ChatOpenAI accepts a base_url, so LangChain chains and agents can bill to a flat ChatGPT plan through Codex Hosted. Python and JS setup, plus worked agent-loop math.
ChatOpenAI takes a base_url argument, and that argument is the entire LangChain integration. Point it at https://api.proxyllm.ai/v1 with a ProxyLLM key and every chain, agent, and LangGraph graph built on that client bills to your own flat ChatGPT subscription through Codex Hosted instead of the per-token meter. That matters for LangChain specifically because the framework’s whole design multiplies model calls, and the meter bills each one.
The setup in Python
import os
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="gpt-5",
base_url="https://api.proxyllm.ai/v1",
api_key=os.environ["PROXYLLM_API_KEY"],
)
print(llm.invoke("Summarize this diff for a changelog.").content)
Nothing else changes. Prompts, output parsers, tool bindings, with_structured_output, and LangGraph nodes all operate on the same client. Connecting your ChatGPT account, by OpenAI’s device-code flow, takes about five minutes and is walked through in the setup guide.
The setup in JavaScript
The JS client wraps the OpenAI Node SDK, so the base URL rides in configuration:
import { ChatOpenAI } from "@langchain/openai";
const llm = new ChatOpenAI({
model: "gpt-5",
apiKey: process.env.PROXYLLM_API_KEY,
configuration: { baseURL: "https://api.proxyllm.ai/v1" },
});
const res = await llm.invoke("Summarize this diff for a changelog.");
console.log(res.content);
Both clients keep the OpenAI response shape, so token-usage callbacks and tracing keep reporting numbers; the difference is what those numbers cost you.
Why LangChain bills run hot
LangChain’s abstractions are call multipliers, by design rather than by accident:
- Chains. A map-reduce summarization makes one call per chunk plus combine passes; a refine chain calls once per document section. Two hundred chunks is two hundred calls before anyone reads the output.
- Agents. A tool-using agent loops plan, act, observe; 10 to 20 calls per task is normal, and each later call re-sends the accumulated history as fresh input tokens.
- Retries and guards. Output-parser retries, structured-output corrections, and guardrail passes are full-price calls that never appear in your mental model of “one request.”
On a meter, every one of those calls is a line item. On a subscription window, the task consumes capacity and the step count stops being a billing event. The general version of that argument is in why agent workloads flip the math.
A worked agent loop
Take a LangGraph research agent averaging 14 model calls per task: planning, four tool rounds with results re-entering context, a synthesis pass, and a structured-output retry. Context accumulates to roughly 95,000 input tokens and 4,000 output tokens per task. At GPT-5 list prices (June 2026: $1.25 per million input, $10 per million output, live at openai.com/api/pricing):
Input: 95,000 × $1.25/M = $0.119
Output: 4,000 × $10/M = $0.040
Cost per task ≈ $0.16
Sixteen cents per task, multiplied by an automation schedule:
| Tasks/day | Calls/day | API cost/mo | Flat setup (estimates) |
|---|---|---|---|
| 100 | 1,400 | ~$480 | Plus $20 + $129 = $149 (window ≈ $700) |
| 500 | 7,000 | ~$2,380 | Pro 5x $100 + $129 = $229 (≈ $3,500) |
| 1,500 | 21,000 | ~$7,140 | Pro 20x $200 + $129 = $329 (≈ $14,000) |
A 500-task-a-day LangGraph agent costs about $2,380 a month on the meter and about $229 on a Pro 5x subscription-backed setup, as an estimate rather than a guarantee. Window capacities are planning estimates; OpenAI sets the actual limits. To model your own loop shape, steps times calls times tokens times retry rate, use the agent cost formula.
What changes and what does not
- Streaming. The Codex lane returns complete responses. Agents and chains barely notice, since they need full responses before the next step anyway. User-facing streaming UIs should stay on an API-key lane, which streams normally.
- Embeddings.
OpenAIEmbeddingsstays on your own OpenAI key at the default base URL; embeddings are not part of what the Codex lane serves, and at their prices they are not the problem you are solving. - Fallback. When a plan window exhausts, requests fall back to a second connected account, then your own API key, and the request log shows which lane served each call. Your
AgentExecutornever knows. - Posture. Programmatic Codex use is documented OpenAI functionality, your account is yours alone in an isolated container, and OpenAI has the final call over its services.
The condensed version of this page lives at the LangChain integration. If your LangChain app already has a visible OpenAI invoice, the calculator maps it to a plan tier in about thirty seconds.
Frequently asked questions
How do I point LangChain's ChatOpenAI at a custom endpoint?
In Python, pass base_url to the constructor: ChatOpenAI(model='gpt-5', base_url='https://api.proxyllm.ai/v1', api_key=...). In JavaScript, pass configuration: { baseURL: '...' } to new ChatOpenAI(). Everything downstream, chains, agents, and LangGraph graphs, uses that client unchanged.
Why is my LangChain app so expensive on the OpenAI API?
Because LangChain abstractions multiply calls. A map-reduce chain makes one call per chunk plus a combine pass, agents loop through plan-act-observe cycles of 10 to 20 calls per task, and each later call re-sends the accumulated context as fresh input tokens. The meter bills every step, so framework convenience compounds into token volume.
Does LangChain streaming work through ProxyLLM?
On the Codex Hosted lane, no: responses arrive complete rather than token by token, which suits agents and chains that need full responses before acting anyway. Requests served by API-key lanes stream normally, so keep user-facing streaming surfaces on a key lane.
Do embeddings and vector stores work with the flat lane?
Vector stores are unaffected, and embeddings keep working but should stay on your own OpenAI API key at the default base URL. Embedding models are not part of what the Codex lane serves, and at pennies per million tokens they are rarely the part of a LangChain bill worth moving.