Model integration · Replicate

Replicate for model breadth.

Pass chat-compatible Replicate text models through one endpoint on your own key. Specialized predictions stay on the native API until direct adapter support lands.

$129/month SaaS. Bring your own model keys. No inference markup.

Three steps to connect.

01

Use Replicate for model breadth

Replicate hosts community models and custom deployments. ProxyLLM covers the chat-compatible text side today; native Replicate API support is future work.

02

Unify text inference

Send chat-completion-compatible requests through https://api.proxyllm.ai/v1 on your own key so usage and budget caps stay in ProxyLLM.

03

Keep media on the native API

Image, video, and other non-chat Replicate predictions stay on Replicate's own API until ProxyLLM adds a direct adapter.

Unify the text side first.

Chat-compatible Replicate models sit behind the same OpenAI-compatible gateway on your key.

client.py
from openai import OpenAI

client = OpenAI(
    base_url="https://api.proxyllm.ai/v1",
    api_key="pk_live_...",
)

r = client.chat.completions.create(
    model="replicate/meta/meta-llama-3-70b-instruct",
    messages=[{"role": "user", "content": "Create a compact product FAQ."}],
)
Codex Hosted · the main feature

Run your AI workloads on your ChatGPT subscription.

ProxyLLM runs OpenAI's Codex for you, signed in with your own ChatGPT account. Your apps call one OpenAI-compatible endpoint and the work bills to your flat plan instead of per-token API pricing.

$129/month · normal SaaS pricing

Track the text side.

Request logs, budget caps, and scoped sub-keys for Replicate text workloads, without pretending every Replicate API surface is identical.