Groq for high throughput jobs.
Put groq/llama-3.1-70b-versatile behind the same ProxyLLM endpoint as your premium models. A good fit for lots of small, latency-sensitive requests, with a log line on each one.
$129/month SaaS. Bring your own model keys. No inference markup.
Three steps to connect.
Pass Groq-hosted models through
Groq serves Llama-family models at very high throughput. Use OpenRouter-backed access with your own key today; native Groq key storage is a future direct-provider option.
Use the same client
Set your OpenAI-compatible base URL to https://api.proxyllm.ai/v1 and keep chat completions code unchanged.
Cap the high-volume jobs
Classification, scoring, and extraction pile up requests fast. Give those services scoped sub-keys with budget caps so volume never outruns the bill you expected.
Fast lane, same gateway.
Call a Groq-backed model and keep the OpenAI chat completions shape.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.proxyllm.ai/v1",
apiKey: "pk_live_...",
});
const r = await client.chat.completions.create({
model: "groq/llama-3.1-70b-versatile",
messages: [{ role: "user", content: "Classify this support ticket." }],
}); Run your AI workloads on your ChatGPT subscription.
ProxyLLM runs OpenAI's Codex for you, signed in with your own ChatGPT account. Your apps call one OpenAI-compatible endpoint and the work bills to your flat plan instead of per-token API pricing.
High volume, visible cost.
Measure Groq throughput, spend, and failures beside your other providers in one request log, with no markup on inference.