Fireworks for hosted open models.
Run fireworks/llama-v3p1-70b-instruct and other Fireworks-hosted models through one endpoint on your own key, with sub-keys, caps, and logs around the traffic.
$129/month SaaS. Bring your own model keys. No inference markup.
Three steps to connect.
Use Fireworks for hosted open models
Fireworks serves open models and fine-tuned deployments at production speed. ProxyLLM passes that traffic through on your own key; native Fireworks key storage can follow later.
Keep one API shape
Use https://api.proxyllm.ai/v1 from your OpenAI-compatible SDK so application code never cares which host serves the model.
Give each service a sub-key
Scope sub-keys per app with their own budget caps, then read the request logs to see which workload is eating the Llama budget.
Hosted open models, one client.
Call a Fireworks-backed model through the OpenAI-compatible gateway on your own key.
from openai import OpenAI
client = OpenAI(
base_url="https://api.proxyllm.ai/v1",
api_key="pk_live_...",
)
r = client.chat.completions.create(
model="fireworks/llama-v3p1-70b-instruct",
messages=[{"role": "user", "content": "Generate three support replies."}],
) Run your AI workloads on your ChatGPT subscription.
ProxyLLM runs OpenAI's Codex for you, signed in with your own ChatGPT account. Your apps call one OpenAI-compatible endpoint and the work bills to your flat plan instead of per-token API pricing.
See what Fireworks costs you.
Per-request logs show latency, tokens, and spend for every Fireworks call. $129/month flat, no markup on inference.