Codex in CI/CD: Pipelines, Gates, and Non-Interactive Runs

How to run OpenAI's Codex CLI in CI/CD: codex exec for pre-merge checks, sandbox and approval presets, output piping, and the right auth for shared vs personal runners.

codex exec makes Codex a normal CI citizen: one shell command in, a completed result on stdout, an exit code out. OpenAI documents this non-interactive mode for scripts and CI jobs (developers.openai.com/codex/noninteractive), so the question is not whether Codex runs in a pipeline but how to wire it well. Two decisions carry most of the weight: auth, where shared runners get an API key and only your own machines get your ChatGPT plan, and presets, where sandbox and approval behavior are set explicitly because nothing in CI can answer a permission prompt.

The three job shapes

Almost every Codex pipeline job is one of these:

  • Pre-merge checks. Review the diff, hunt for bugs and missing tests, fail the build on a bad verdict.
  • Scheduled jobs. Nightly digests, dependency audits, stale-doc sweeps, failure triage.
  • Post-merge chores. Changelog updates, release notes drafts, doc regeneration after an API change.

All three reduce to codex exec "prompt" plus ordinary shell glue. If exec itself is new to you, the codex exec guide covers the command surface; this page is about the pipeline around it.

A pre-merge review gate

A complete GitHub Actions workflow that reviews every pull request and fails when Codex returns a failing verdict:

name: codex-review
on:
  pull_request:

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - run: npm i -g @openai/codex

      - name: Authenticate with the repo's API key
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: printf '%s' "$OPENAI_API_KEY" | codex login --with-api-key

      - name: Review the diff
        run: |
          codex exec --sandbox read-only \
            "Review the diff between origin/main and HEAD for bugs, missing tests, and risky changes. Cite file and line. End with exactly one line: VERDICT: PASS or VERDICT: FAIL." \
            | tee review.md

      - name: Gate on the verdict
        run: grep -q "VERDICT: PASS" review.md

      - name: Publish the review
        if: always()
        run: cat review.md >> "$GITHUB_STEP_SUMMARY"

The load-bearing piece is the verdict contract. Codex exits zero whenever it completes, so the gate is your grep, not the agent’s mood. fetch-depth: 0 exists so origin/main is present to diff against, and tee keeps the full review for the step summary even when the gate fails.

The same four steps, install, login, exec, grep, port to GitLab CI or Jenkins unchanged. If you would rather not maintain the install-and-auth boilerplate on GitHub, OpenAI ships an official action that wraps it; the Codex GitHub Action guide covers that route, including posting reviews as PR comments.

Approval and sandbox presets

Interactive Codex pauses to ask before risky commands. CI has nobody to ask, so state the policy up front:

# checks and reports: the repo stays read-only
codex exec --sandbox read-only "audit src/auth for unsafe redirect handling"

# jobs allowed to edit the workspace: lint fixes, doc regeneration
codex exec --full-auto "fix all eslint errors, then rerun npm run lint to confirm"

--sandbox read-only fits most CI work, because most CI work is judgment on a diff. --full-auto permits edits within the workspace and suits fix-up jobs whose output you review in a PR anyway. The third level, --sandbox danger-full-access, belongs only inside a disposable container where Docker is the isolation boundary; the Docker guide shows that layout.

For a runner image many jobs share, bake the defaults into config instead of repeating flags:

# ~/.codex/config.toml on the runner image
approval_policy = "never"
sandbox_mode = "read-only"

A useful rule: in CI, the sandbox should match the job’s write needs exactly, never one level looser.

Piping output where it needs to go

Everything lands on stdout, so the rest of your toolchain already knows what to do with it:

codex exec --sandbox read-only \
  "List exported functions in src/ with no test coverage. Output only a JSON array of {file, symbol}." \
  | jq -r '.[] | "- \(.file): \(.symbol)"' >> "$GITHUB_STEP_SUMMARY"

The phrase “output only a JSON array” is doing real work. If the model wraps the JSON in prose, jq fails loudly, and in CI a loud failure is the correct behavior. Treat output format as part of the prompt contract and validate it downstream.

Scheduled jobs in CI

CI schedulers run Codex jobs with zero extra infrastructure:

on:
  schedule:
    - cron: "15 6 * * 1-5" # weekday mornings, UTC
  workflow_dispatch:

Good fits: a morning digest of yesterday’s commits, a weekly dependency audit, a docs freshness check. Know that GitHub’s cron is best effort and can slip by minutes or skip under load. For schedules that should hold to the minute, with real failure alerting, run them on a machine you control; run Codex on a schedule covers cron, systemd timers, and the reliability trade-offs.

Auth: shared runners vs your own

This is the decision that decides your bill and your account hygiene.

Shared runners (GitHub-hosted, anything teammates can trigger) get an API key as a masked secret, exactly as in the workflow above. A key is revocable in one click, auditable per request, and tied to a project rather than a person. Billing is metered per token.

Runners only you use (your workstation, a personal VPS acting as a self-hosted runner) can hold your ChatGPT plan session via codex login --device-auth, which prices your bulk automation into the flat subscription you already pay for. The session file, ~/.codex/auth.json, is password-grade: it never belongs in shared secrets, images, or repos. OpenAI’s terms tie each account to one user, so the dividing line is simple: if other people can trigger the job, it is not your personal workload anymore.

Shared pipelines get a key; personal pipelines get your plan. The full decision, billing and limits included, is in Codex auth: API key vs ChatGPT sign-in.

What CI cannot fix

A pipeline can retry a flaky step. It cannot refill a usage window, queue calls across jobs that fire at once, or tell you afterward what each run would have cost. Plan-backed exec in CI inherits all three problems the moment your volume gets serious.

That operational layer is the actual product behind Codex Hosted: the same official CLI, signed in with your own account in an isolated container, exposed as an OpenAI-compatible endpoint with queueing, per-request logs, and automatic fallback when a window exhausts. The honest accounting of when DIY stops being worth it is in Codex Hosted vs running Codex yourself.

Frequently asked questions

Can you run Codex CLI in a CI pipeline?

Yes. codex exec is the CLI's documented non-interactive mode: it runs one prompt to completion, prints the result to stdout, and exits with a status code. That makes it usable in any CI system that can run a shell command, including GitHub Actions, GitLab CI, and Jenkins.

How should Codex authenticate on shared CI runners?

With an OpenAI API key stored as a masked secret. A key is revocable, auditable, and not tied to a person. ChatGPT plan sign-in belongs on machines only you use, because OpenAI's terms tie each account to one user.

How do I stop Codex from asking for approval in CI?

Set the behavior explicitly instead of relying on defaults. Pass --sandbox read-only for jobs that only inspect the repo, or --full-auto for jobs allowed to edit files in the workspace. With the policy preset, codex exec never blocks waiting for a human.

Can a Codex review fail my build?

Yes. End the prompt with a fixed marker such as VERDICT: PASS or VERDICT: FAIL, then grep the output in the next step. A missing or failing verdict exits non-zero, and the pipeline fails like any other check.

More on Codex CLI
Codex Hosted · the main feature

Run your AI workloads on your ChatGPT subscription.

ProxyLLM runs OpenAI's Codex for you, signed in with your own ChatGPT account. Your apps call one OpenAI-compatible endpoint and the work bills to your flat plan instead of per-token API pricing.