← Blog Integrations June 12, 2026

Node.js OpenAI SDK Behind a Flat-Rate Endpoint

Point the OpenAI Node.js SDK at ProxyLLM with baseURL or env vars. Services and serverless functions bill OpenAI calls to a flat ChatGPT subscription.

The OpenAI Node.js SDK accepts a baseURL option, and it reads OPENAI_BASE_URL from the environment when you leave it out. Either path points your service at https://api.proxyllm.ai/v1, where OpenAI-model calls run through Codex Hosted on your own ChatGPT subscription instead of metered API billing. For a service already using the official SDK, switching billing models is a config change shipped with a deploy.

Two ways to configure the client

Explicit, in code:

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.proxyllm.ai/v1",
  apiKey: process.env.PROXYLLM_API_KEY,
});

Or implicit, through the environment the SDK already checks:

# .env or your platform's secrets manager
OPENAI_BASE_URL=https://api.proxyllm.ai/v1
OPENAI_API_KEY=pk_live_your_key

import OpenAI from "openai";

const client = new OpenAI(); // picks up both variables

The second form is the one we recommend for teams. The endpoint becomes deployment configuration, staging and production can run different lanes, and nothing about billing lives in the repository. Generating the key and connecting your ChatGPT account by device code takes a few minutes; the setup guide walks through it.

A worked service example

A webhook handler that summarizes inbound support threads, the kind of endpoint that runs thousands of times a day without anyone watching it:

import OpenAI from "openai";

const client = new OpenAI(); // OPENAI_BASE_URL + OPENAI_API_KEY from env

export async function summarizeThread(thread: string): Promise<string> {
  const result = await client.chat.completions.create({
    model: "gpt-5",
    messages: [
      {
        role: "system",
        content: "Summarize this support thread in three sentences.",
      },
      { role: "user", content: thread },
    ],
  });
  return result.choices[0]?.message?.content ?? "";
}

Give this service its own ProxyLLM key with a budget cap. Per-service keys are the difference between “our AI spend went up” and “the webhook worker consumed 3x its usual volume on Tuesday”: the request log breaks out every call by key, with the lane that served it and its API-equivalent value.

What a production feature costs, metered vs flat

Say the feature serves 2,000 daily active users, each triggering an average of 6 model calls a day at 1,500 input and 200 output tokens per call, on gpt-5 ($1.25 per million input, $10 per million output, OpenAI list prices as of June 2026).

Line	Arithmetic	Monthly
Calls	2,000 users × 6 = 12,000/day
Input	12,000 × 1,500 × 30 = 540M × $1.25	$675
Output	12,000 × 200 × 30 = 72M × $10	$720
API total		$1,395
Flat setup	ChatGPT Pro 5x $100 + ProxyLLM $129	$229

An AI feature serving 2,000 daily users runs about $1,395 a month at GPT-5 list prices and about $229 flat. The Pro 5x window absorbs an estimated $3,500 of API-equivalent work per month, so this workload fits with headroom; treat that figure as a planning estimate, since OpenAI tunes plan limits over time. Growth past the window costs a $100 plan step, with your own API key catching overflow in the meantime.

Serverless and worker notes

The Codex lane returns complete responses, and serverless handlers are better for it. There is no server-sent-events connection to hold open, no partial flush, no edge-runtime streaming shims. Await the call, return the JSON.

Three practical points:

Timeouts. Size the function timeout for the slowest generation you expect, with margin. Anything routinely long-running belongs on a queue worker, where retries and backoff are first-class.
Retries. The SDK’s maxRetries option still applies, and errors keep the OpenAI shape, so existing error handling carries over unchanged.
Key hygiene. One key per deployable unit: the API route, the queue worker, the cron job. Caps on the unattended ones.

Streaming, and when to keep the key lane

If your product renders tokens as they generate (a chat UI with a typing effect), that path needs streaming, and streaming is what the API-key lanes do. The honest split: user-facing chat stays on a streaming lane, while background work (summaries, enrichment, queue jobs, webhooks) moves to the flat lane where the volume is. If you are on Vercel’s stack, the Vercel AI SDK guide covers how that split looks in practice, and the Python version of this article is here.

The condensed steps live on the Node.js integration page. If your service’s OpenAI line item has crossed a few hundred dollars a month, the calculator maps it to a plan tier in thirty seconds.

Frequently asked questions

How do I set a custom baseURL in the OpenAI Node.js SDK?

Pass baseURL to the constructor, for example new OpenAI({ baseURL: 'https://api.proxyllm.ai/v1', apiKey: process.env.PROXYLLM_API_KEY }). The SDK also reads the OPENAI_BASE_URL and OPENAI_API_KEY environment variables, so new OpenAI() with no arguments works once those are set.

Can I switch a Node service to a flat-rate endpoint without code changes?

Yes, if the service uses the official SDK with default configuration. Set OPENAI_BASE_URL and OPENAI_API_KEY in the deployment environment and the next deploy routes every OpenAI call through the new endpoint. Unsetting them routes calls back to api.openai.com.

Does this work in serverless functions?

Yes. The Codex Hosted lane returns complete responses, which suits function-style handlers: await the call, return the body. Size the function timeout for the slowest expected generation, and move long jobs to a queue worker instead of an HTTP-triggered function.

What happens when the ChatGPT plan hits a usage limit?

Requests fall back to a second connected ChatGPT account if one exists, then to your own OpenAI API key, until the window resets. Your Node code sees a normal response either way, and the request log shows which lane served each call.