Model integration · Cohere

Cohere for RAG, behind one gateway.

Run command-r-plus and the rest of the Command family for retrieval-heavy work without provider-specific glue code. Your key passes through; sub-keys, caps, and logs wrap the traffic.

Start free How to connect

$129/month SaaS. Bring your own model keys. No inference markup.

Three steps to connect.

Use your OpenRouter key

Cohere's Command family ships through OpenRouter today. Add your own key once; native Cohere key storage can come later without touching client code.

Keep one gateway

Send requests to https://api.proxyllm.ai/v1 with your ProxyLLM key. Your app never learns Cohere's native API shape.

Watch retrieval spend

RAG summarization burns tokens on context. Track volume per sub-key and set budget caps before a retrieval pipeline surprises you.

Cohere as a passthrough model.

Use cohere/ model names where your configured provider exposes them.

client.py

from openai import OpenAI

client = OpenAI(
    base_url="https://api.proxyllm.ai/v1",
    api_key="pk_live_...",
)

r = client.chat.completions.create(
    model="cohere/command-r-plus",
    messages=[{"role": "user", "content": "Answer from these retrieved passages."}],
)

Codex Hosted · the main feature

Run your AI workloads on your ChatGPT subscription.

ProxyLLM runs OpenAI's Codex for you, signed in with your own ChatGPT account. Your apps call one OpenAI-compatible endpoint and the work bills to your flat plan instead of per-token API pricing.

Get Codex Hosted How it works

$129/month · normal SaaS pricing

Control retrieval spend.

RAG gets expensive quickly. ProxyLLM gives each pipeline a scoped sub-key, a budget cap, and a request log, with no markup on inference.

Start free All integrations