ProxyLLM vs LiteLLM: Hosted Flat-Rate vs Self-Hosted Router

LiteLLM is a free self-hosted router across many providers. ProxyLLM is a hosted flat-rate lane for OpenAI volume. Different layers, and they compose well.

LiteLLM and ProxyLLM sit at different layers, so the honest comparison starts there. LiteLLM is software: a free, open-source proxy you host that routes one OpenAI-compatible API across 100+ providers with virtual keys and budgets. ProxyLLM is a service: we host a lane that changes what your OpenAI traffic costs, by running Codex against your own flat ChatGPT subscription. A router organizes your spend; it cannot change the price of a token. We change the price of the OpenAI share of it, and the two compose cleanly.

What LiteLLM does well

LiteLLM is the standard answer for multi-provider plumbing, and it earned that. One config file maps model names to providers; the proxy speaks OpenAI’s API shape to your apps regardless of what sits behind it; virtual keys carry per-team budgets and rate limits; routing rules handle fallbacks and load balancing. It is free to run, audited by thousands of deployments, and as private as your own infrastructure, because it is your own infrastructure.

The costs are the self-hosting costs. You operate the gateway, secure the key store, monitor the process, and apply the updates, and at scale the proxy becomes one more production service with your name on the pager. How that burden compares across the DIY landscape is mapped in self-hosting a Codex proxy vs ProxyLLM, and LiteLLM against the big hosted marketplace in LiteLLM vs OpenRouter.

What we do instead

We do not try to out-route LiteLLM. Codex Hosted does one structural thing: it runs OpenAI’s official, unmodified Codex CLI on managed servers, signed into your own ChatGPT account through OpenAI’s device-code flow, and exposes it as an OpenAI-compatible endpoint. OpenAI-bound work bills to the flat subscription instead of the meter. As planning estimates, Plus absorbs roughly $700 of API-equivalent work a month, Pro 5x roughly $3,500, Pro 20x roughly $14,000; estimates, never guarantees. The fee is $129 a month, no inference markup, with your own keys (OpenAI, OpenRouter) passing through at provider rates and per-request logs naming the lane that served each call.

Side by side

AxisLiteLLMProxyLLM
What it isOSS proxy/router you hostHosted service
PriceFree core, your infra and hours$129/mo flat; $0 Starter tier (BYO keys, logs)
Provider surface100+ providers, one APIOpenAI flat lane; your own keys as passthrough
Changes per-token price?No; budgets and routing around the meterYes, for OpenAI traffic: plan windows replace the meter
Keys and budgetsVirtual keys, self-operatedScoped sub-keys with budgets, hosted
StreamingStreams whatever the provider streamsKey lanes stream; the Codex lane returns complete responses
OpsYoursOurs

The rows that decide it: if you need many providers behind one self-operated gateway, LiteLLM. If your problem is the size of the OpenAI bill itself, a router will not fix that and we will.

Composing them: LiteLLM upstream of the flat lane

Because our endpoint is OpenAI-compatible, it is just another route in a LiteLLM config. Keep LiteLLM as your front door and point the OpenAI bulk route at the flat lane:

model_list:
  - model_name: gpt-5-flat
    litellm_params:
      model: openai/gpt-5
      api_base: https://api.proxyllm.ai/v1
      api_key: os.environ/PROXYLLM_API_KEY

  - model_name: gpt-5-direct
    litellm_params:
      model: openai/gpt-5
      api_key: os.environ/OPENAI_API_KEY

router_settings:
  fallbacks:
    - gpt-5-flat: ["gpt-5-direct"]

Apps call gpt-5-flat; bulk work bills to the subscription; if the flat route ever errors, LiteLLM retries direct. Two notes on that config. The flat lane already carries its own internal fallback (a second connected account, then your API key), so the LiteLLM-level fallback is a belt on top of suspenders. And the Codex lane returns complete responses, so route token-streaming surfaces to gpt-5-direct or a key lane instead.

Which to choose

Choose LiteLLM when the problem is organizational: many providers, many teams, budgets and routing logic you want under your own control, with engineers who are happy operating it. Choose ProxyLLM when the problem is economic: an OpenAI-shaped bill above roughly $150 a month that scales with every call. Choose both when both are true, which describes most platform teams we talk to; the broader field of options is ranked honestly in ProxyLLM alternatives.

If the economic problem is yours, the calculator prices your current OpenAI spend against the flat lane in about thirty seconds, and the LiteLLM config above is all the integration there is.

Frequently asked questions

What is the difference between LiteLLM and ProxyLLM?

LiteLLM is open-source software you host: a proxy that routes one OpenAI-compatible API across 100+ providers, with virtual keys and budgets you operate yourself. ProxyLLM is a hosted service that changes what OpenAI traffic costs, by running Codex on your own flat ChatGPT subscription. One is a router; the other is a billing lane.

Does LiteLLM reduce OpenAI costs?

Indirectly at best. LiteLLM can enforce budgets, cache responses, and route to cheaper models, but every OpenAI token it forwards still bills at OpenAI's per-token rates. A router organizes the meter; it cannot change the price of a token. Changing the price requires a different billing lane, such as subscription-backed capacity.

Can LiteLLM route to ProxyLLM?

Yes. ProxyLLM exposes a standard OpenAI-compatible endpoint, so a LiteLLM model entry with api_base set to https://api.proxyllm.ai/v1 routes traffic through the flat lane. Teams keep LiteLLM as the front door for all providers and point only OpenAI-bound bulk work at the flat route.

Is LiteLLM free?

The core proxy is free, open-source software; LiteLLM also sells an enterprise tier. The real cost of the free tier is operational: you run the gateway, store the keys, watch the uptime, and apply the updates. For many platform teams that trade is exactly right.

More on Comparisons
Codex Hosted · the main feature

Run your AI workloads on your ChatGPT subscription.

ProxyLLM runs OpenAI's Codex for you, signed in with your own ChatGPT account. Your apps call one OpenAI-compatible endpoint and the work bills to your flat plan instead of per-token API pricing.