Browse docs

Move across sections without leaving the current page.

Open

Real Codex E2E harness

Run the deterministic end-to-end harness when you need to verify the full Maestro-to-Codex handoff with the real CLI.

What the harness does

The real Codex harness exercises the full Maestro loop:

  1. build maestro
  2. create a temporary repo root and SQLite database
  3. write a dedicated WORKFLOW.md
  4. create two simple issues and move them to ready
  5. start maestro run
  6. wait for Codex to complete both issues
  7. verify the expected output artifacts

Entry points

make e2e-real-codex
make e2e-real-codex-phases

Those targets run:

  • scripts/e2e_real_codex.sh for the basic single-pass artifact flow
  • scripts/e2e_real_codex_phases.sh for the implementation/review/done phase flow

What it verifies

The generated workflow asks Codex to:

  • read the issue description
  • create the requested artifact in a shared output directory
  • confirm the file contents from the shell
  • mark the issue done

The test issues are intentionally deterministic:

  • artifact-one.txt must contain maestro e2e ok 1
  • artifact-two.txt must contain maestro e2e ok 2

The phase harness verifies additional deterministic paths:

  • one issue must go implementation -> review -> done -> complete
  • one issue must go implementation -> done -> complete without review
  • each phase writes a dedicated artifact and appends to a phase log in the expected order
  • restarting maestro run cleans up completed workspaces on startup

Why it uses codex exec

The harness uses the real Codex CLI in stdio mode via codex exec so the flow stays end-to-end while remaining easy to launch from a shell script:

  • Maestro still creates issues, manages workspaces, and drives scheduling
  • Codex still performs the file and shell actions
  • the verification stays deterministic and local

Requirements

  • go
  • codex
  • an active Codex login or session

Environment overrides

  • E2E_TIMEOUT_SEC: total wait time per issue, default 600
  • E2E_POLL_SEC: poll interval while waiting, default 2
  • E2E_KEEP_HARNESS: keep the temporary harness directory after success, default 1
  • E2E_ROOT: reuse a specific harness directory instead of creating a new temp directory
  • E2E_CODEX_COMMAND: override the Codex command, mainly for local harness validation