Model integration · Groq

Groq for high throughput jobs.

Put groq/llama-3.1-70b-versatile behind the same ProxyLLM endpoint as your premium models. A good fit for lots of small, latency-sensitive requests, with a log line on each one.

$129/month SaaS. Bring your own model keys. No inference markup.

Three steps to connect.

01

Pass Groq-hosted models through

Groq serves Llama-family models at very high throughput. Use OpenRouter-backed access with your own key today; native Groq key storage is a future direct-provider option.

02

Use the same client

Set your OpenAI-compatible base URL to https://api.proxyllm.ai/v1 and keep chat completions code unchanged.

03

Cap the high-volume jobs

Classification, scoring, and extraction pile up requests fast. Give those services scoped sub-keys with budget caps so volume never outruns the bill you expected.

Fast lane, same gateway.

Call a Groq-backed model and keep the OpenAI chat completions shape.

client.ts
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.proxyllm.ai/v1",
  apiKey: "pk_live_...",
});

const r = await client.chat.completions.create({
  model: "groq/llama-3.1-70b-versatile",
  messages: [{ role: "user", content: "Classify this support ticket." }],
});
Codex Hosted · the main feature

Run your AI workloads on your ChatGPT subscription.

ProxyLLM runs OpenAI's Codex for you, signed in with your own ChatGPT account. Your apps call one OpenAI-compatible endpoint and the work bills to your flat plan instead of per-token API pricing.

$129/month · normal SaaS pricing

High volume, visible cost.

Measure Groq throughput, spend, and failures beside your other providers in one request log, with no markup on inference.