Cohere for RAG, behind one gateway.
Run command-r-plus and the rest of the Command family for retrieval-heavy work without provider-specific glue code. Your key passes through; sub-keys, caps, and logs wrap the traffic.
$129/month SaaS. Bring your own model keys. No inference markup.
Three steps to connect.
Use your OpenRouter key
Cohere's Command family ships through OpenRouter today. Add your own key once; native Cohere key storage can come later without touching client code.
Keep one gateway
Send requests to https://api.proxyllm.ai/v1 with your ProxyLLM key. Your app never learns Cohere's native API shape.
Watch retrieval spend
RAG summarization burns tokens on context. Track volume per sub-key and set budget caps before a retrieval pipeline surprises you.
Cohere as a passthrough model.
Use cohere/ model names where your configured provider exposes them.
from openai import OpenAI
client = OpenAI(
base_url="https://api.proxyllm.ai/v1",
api_key="pk_live_...",
)
r = client.chat.completions.create(
model="cohere/command-r-plus",
messages=[{"role": "user", "content": "Answer from these retrieved passages."}],
) Run your AI workloads on your ChatGPT subscription.
ProxyLLM runs OpenAI's Codex for you, signed in with your own ChatGPT account. Your apps call one OpenAI-compatible endpoint and the work bills to your flat plan instead of per-token API pricing.
Control retrieval spend.
RAG gets expensive quickly. ProxyLLM gives each pipeline a scoped sub-key, a budget cap, and a request log, with no markup on inference.