Model integration · Fireworks AI

Fireworks for hosted open models.

Run fireworks/llama-v3p1-70b-instruct and other Fireworks-hosted models through one endpoint on your own key, with sub-keys, caps, and logs around the traffic.

$129/month SaaS. Bring your own model keys. No inference markup.

Three steps to connect.

01

Use Fireworks for hosted open models

Fireworks serves open models and fine-tuned deployments at production speed. ProxyLLM passes that traffic through on your own key; native Fireworks key storage can follow later.

02

Keep one API shape

Use https://api.proxyllm.ai/v1 from your OpenAI-compatible SDK so application code never cares which host serves the model.

03

Give each service a sub-key

Scope sub-keys per app with their own budget caps, then read the request logs to see which workload is eating the Llama budget.

Hosted open models, one client.

Call a Fireworks-backed model through the OpenAI-compatible gateway on your own key.

client.py
from openai import OpenAI

client = OpenAI(
    base_url="https://api.proxyllm.ai/v1",
    api_key="pk_live_...",
)

r = client.chat.completions.create(
    model="fireworks/llama-v3p1-70b-instruct",
    messages=[{"role": "user", "content": "Generate three support replies."}],
)
Codex Hosted · the main feature

Run your AI workloads on your ChatGPT subscription.

ProxyLLM runs OpenAI's Codex for you, signed in with your own ChatGPT account. Your apps call one OpenAI-compatible endpoint and the work bills to your flat plan instead of per-token API pricing.

$129/month · normal SaaS pricing

See what Fireworks costs you.

Per-request logs show latency, tokens, and spend for every Fireworks call. $129/month flat, no markup on inference.