← Blog Integrations June 12, 2026

The OpenAI-Compatible API, by Hand: curl Examples

Raw curl requests against an OpenAI-compatible endpoint: headers, request body, response shape, and error envelope, with flat-rate billing behind the URL.

An OpenAI-compatible API accepts OpenAI’s request and response shapes at a different hostname. That is the entire contract, and curl is the cleanest way to see it: one POST to /v1/chat/completions with a bearer token and a JSON body. Against https://api.proxyllm.ai/v1, the same request bills to a flat ChatGPT subscription through Codex Hosted instead of per-token API pricing. If a tool can send an HTTPS POST, it can run on flat-rate OpenAI capacity.

The minimal request

curl https://api.proxyllm.ai/v1/chat/completions \
  -H "Authorization: Bearer pk_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5-mini",
    "messages": [
      { "role": "user", "content": "Extract the invoice total from: Total due EUR 1,240.50" }
    ]
  }'

Two headers, one body. Authorization carries a ProxyLLM key (created in the dashboard, ideally scoped to the script or system making the call). Content-Type must be application/json. The body needs model and messages; everything else is optional.

This is byte-for-byte the request you would send to api.openai.com with an OpenAI key. Compatibility means the only things that change are the host and the bill.

The response, and how to read it

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1781222400,
  "model": "gpt-5-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The invoice total is EUR 1,240.50."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 31,
    "completion_tokens": 12,
    "total_tokens": 43
  }
}

The content lives at choices[0].message.content, so the shell version of every integration is:

curl -s https://api.proxyllm.ai/v1/chat/completions \
  -H "Authorization: Bearer $PROXYLLM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-5-mini", "messages": [{"role": "user", "content": "Say ok."}]}' \
  | jq -r '.choices[0].message.content'

That one-liner is a health check, a cron building block, and a debugging tool in equal measure.

A fuller request: system prompt and parameters

Production calls usually pin behavior with a system message and a temperature:

curl -s https://api.proxyllm.ai/v1/chat/completions \
  -H "Authorization: Bearer $PROXYLLM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5",
    "temperature": 0.2,
    "messages": [
      { "role": "system", "content": "You are a release-notes writer. Be terse." },
      { "role": "user", "content": "Summarize: fixed login redirect, added CSV export, bumped Node to 22." }
    ]
  }'

Standard OpenAI parameters ride along unchanged. The practical boundary is the model surface: you get the models Codex serves on the flat lane, while embeddings, fine-tunes, and unusual parameters belong on your own API key. The full capability map is in what works with Codex Hosted.

Errors keep the OpenAI envelope

Failures come back as the familiar error object, so anything that already parses OpenAI errors needs no new code. A bad key, for example, returns HTTP 401 with a body in this shape:

{
  "error": {
    "message": "Invalid API key provided.",
    "type": "invalid_request_error",
    "code": "invalid_api_key"
  }
}

The cases worth handling in scripts: 401 (wrong or revoked key), 4xx validation errors (malformed body, unknown model), and 429-class responses when a key’s budget cap or rate limit is hit. Caps are a feature, not an accident: a scoped key with a monthly budget turns a misbehaving script into a bounded incident instead of an open-ended invoice.

What sits behind the URL

The endpoint looks like one API; behind it sit lanes. OpenAI-model requests run through Codex Hosted on your connected ChatGPT account, which is where the flat economics come from. When a plan window fills, requests fall back to a second connected account, then to your own OpenAI API key, until the window resets. Responses keep the same shape throughout; the request log shows which lane served each call and what it would have cost at API rates.

One behavior to know before wiring up a UI: the Codex lane returns complete responses, with no server-sent-events stream. curl users rarely care, since curl without --no-buffer waits for the full body anyway. If you need token-by-token streaming, that is the API-key lane’s job.

Once the curl request works, the SDK versions are the same request with nicer ergonomics: Python takes a base_url argument, and the condensed HTTP reference lives on the REST API integration page.

A working curl command is the whole proof of concept. If the volume behind it is real, the front page explains what the flat lane costs and what it absorbs.

Frequently asked questions

How do I call an OpenAI-compatible API with curl?

Send a POST to the host's /v1/chat/completions route with an Authorization: Bearer header and a JSON body containing model and messages. The request is identical to one against api.openai.com; only the hostname and the key change.

What headers does a chat completions request need?

Two: Authorization: Bearer with your API key, and Content-Type: application/json. Everything else (model, messages, temperature, max output size) travels in the JSON body.

What does the chat completions response look like?

A JSON object with an id, the model, a choices array, and a usage block with token counts. The text you want is at choices[0].message.content, which is why the jq one-liner for it shows up in every shell script.

How can I tell which lane served my request?

The response body keeps the standard OpenAI shape regardless of lane. The ProxyLLM request log records each call with the lane that served it (Codex Hosted, second account, or your own API key) and its API-equivalent value, so billing questions get answered in the dashboard rather than by inspecting responses.