LlamaIndex Platform integration · LlamaIndex

Index ten thousand chunks. Pay one flat subscription.

LlamaIndex calls a model for chunk summaries, extraction, and answer synthesis. Point its OpenAI client at ProxyLLM and those calls run through Codex Hosted on your ChatGPT subscription, with your API key as fallback.

$129/month SaaS. Bring your own model keys. No inference markup.

Three steps to connect.

01

Create a RAG key

Generate a scoped ProxyLLM key for each indexer, retriever service, or app. Give batch ingestion its own cap.

02

Configure OpenAI in LlamaIndex

Set the api_base to https://api.proxyllm.ai/v1 and use your ProxyLLM key. The OpenAI-compatible client path stays intact.

03

Ingest on the subscription

Chunk summaries and extraction are high-volume and OpenAI-bound, exactly what Codex Hosted is for. At a plan limit, requests fall back to your API key until it resets.

Configure the OpenAI LLM.

Point LlamaIndex's OpenAI client at ProxyLLM and keep the rest of the pipeline unchanged.

rag.py
from llama_index.llms.openai import OpenAI

llm = OpenAI(
    model="gpt-4o-mini",
    api_key="pk_live_...",
    api_base="https://api.proxyllm.ai/v1",
)
Codex Hosted · the main feature

Run your AI workloads on your ChatGPT subscription.

ProxyLLM runs OpenAI's Codex for you, signed in with your own ChatGPT account. Your apps call one OpenAI-compatible endpoint and the work bills to your flat plan instead of per-token API pricing.

$129/month · normal SaaS pricing

Separate ingestion spend from answer spend.

Give background indexing and user queries their own keys and caps. The request log shows what each side actually costs.