Dify Apps Without Metered OpenAI Billing

Add ProxyLLM as an OpenAI-compatible provider in Dify and app traffic bills to a flat ChatGPT plan. Per-app sub-keys with hard caps make each app's spend visible.

Dify takes a custom OpenAI-compatible model provider, and that one config screen changes where your app spend lands. Set the provider’s API base to https://api.proxyllm.ai/v1 with a ProxyLLM key and OpenAI-model calls from every app using that provider run through Codex Hosted on your flat ChatGPT subscription instead of a metered API key. Add one scoped sub-key per app and each app gets its own hard budget and its own line in the request log.

Here is the provider config, the per-app key pattern, and what a production Dify app costs both ways.

Add the provider

In Dify, open settings and find the model provider section, then add an OpenAI-API-compatible provider (exact field labels shift slightly between Dify versions):

Provider:  OpenAI-API-compatible
API Base:  https://api.proxyllm.ai/v1
API Key:   pk_live_your_proxyllm_key
Model:     gpt-5

Register each model name you plan to use. Plain OpenAI names like gpt-5 and gpt-5-mini run through Codex Hosted on your ChatGPT plan. Prefixed names like anthropic/ or google/ route through your own OpenRouter key as a passthrough lane with no markup, if you have one connected. Your apps, workflows, and agents keep their canvas; only the billing path underneath changes.

Connecting the ChatGPT account behind the endpoint is a one-time device-code sign-in with OpenAI, the same flow described in the n8n guide for its platform.

One app, one key, one cap

Dify makes it easy to ship many apps, which is exactly how OpenAI bills get hard to read. The fix is structural: generate one ProxyLLM sub-key per app and cap it.

  • Caps stop runaways. An agent app that starts looping hits its own budget and stops, instead of consuming whatever the whole workspace had left.
  • Logs attribute spend. The request log breaks out every call by key, with the lane that served it and its API-equivalent value. “Which app got expensive this week” becomes a filter.
  • Caps are governance, not contracts. Raising an app’s budget is a settings change, made after you have seen its real consumption.

Per-app sub-keys turn “our Dify bill went up” into “the onboarding assistant tripled on Thursday.” That sentence is the whole cost-control story. The broader pattern of capping before the bill caps you is in how to cap OpenAI API spending.

What an app costs, metered vs flat

A representative chatflow: a docs assistant that classifies the question with GPT-5 mini, then answers with GPT-5 over retrieved context. Per user message, at OpenAI’s June 2026 list prices (GPT-5 mini $0.25 input / $2 output per million tokens, GPT-5 $1.25 / $10):

StepModelInput/callOutput/callCost/message
ClassifyGPT-5 mini1,20050$0.0004
AnswerGPT-53,500400$0.0084
Per message~$0.0088

At 800 messages a day, that is 24,000 messages and roughly $211 a month for one app. Five apps in that range approach $1,055 a month, climbing with every workflow your team ships. The flat path: ChatGPT Pro 5x at $100 plus ProxyLLM at $129 is $229 a month, with the Pro window absorbing an estimated $3,500 of API-equivalent work. Capacity figures are planning estimates, never guarantees; the request log shows your real numbers, and overflow past a window falls back to a second account or your own API key.

A single light app sits below the crossover and belongs on the meter. A portfolio of Dify apps almost never does.

The two honest caveats

Responses arrive complete, not streamed. The Codex lane returns the full answer in one response. Workflow apps, agents, and batch nodes do not care: each step needs the complete output before the next runs. A user-facing chat app that depends on the typing effect should keep an API-key lane for that path, where streaming works as usual.

Embeddings stay on a real API key. The flat lane’s model surface is what Codex serves, which means chat models. Configure Dify’s embedding models for knowledge bases against your own OpenAI key; we pass BYO keys through with no markup, so both lanes live behind one provider setup.

The same swap applies to Flowise’s canvas with its own mechanics, covered in the Flowise guide, and the condensed Dify steps live on the integration page.

If your Dify workspace already has a real OpenAI bill, put it in the calculator and read what the same traffic costs against a plan.

Frequently asked questions

How do I set a custom OpenAI base URL in Dify?

In Dify's settings, open the model provider section and add an OpenAI-API-compatible provider. Set the API base to https://api.proxyllm.ai/v1, paste a ProxyLLM key as the API key, and register the model names you want, for example gpt-5. Apps then pick those models like any other.

Can I cap OpenAI spending per Dify app?

Yes, by giving each app its own scoped sub-key with a budget cap. A runaway workflow hits its own cap instead of draining the budget every app shares, and the request log shows spend per key, so per-app cost reporting is a filter rather than a spreadsheet.

Do Dify knowledge bases work over a subscription-backed endpoint?

Retrieval-augmented apps work, but embedding models stay on a real OpenAI API key, because the subscription lane serves what Codex serves: chat models. Point Dify's embedding provider at your own key and the chat traffic at the flat lane.

What does a Dify app cost per month on the OpenAI API?

A two-step chatflow (classifier plus answer) at 800 messages a day runs about $211 a month at June 2026 GPT-5 list prices. Five apps like it approach $1,055 a month metered, versus about $229 flat on a Pro 5x plan plus the gateway fee.

More on Integrations
Codex Hosted · the main feature

Run your AI workloads on your ChatGPT subscription.

ProxyLLM runs OpenAI's Codex for you, signed in with your own ChatGPT account. Your apps call one OpenAI-compatible endpoint and the work bills to your flat plan instead of per-token API pricing.