# Agent Brief: harb-evaluator The evaluator runtime executes formula-defined pipelines. Scripts in this directory handle stack lifecycle, scenario execution, evidence collection, and the adversarial agent harness. ## Directory Layout | File | Purpose | |------|---------| | `evaluate.sh` | Holdout gate: worktree checkout → Docker stack → Playwright scenarios → teardown | | `red-team.sh` | Adversarial agent runner: Anvil bootstrap → attack suite → Claude agent → evidence | | `run-protocol.sh` | On-chain health snapshot (TVL, fees, positions, rebalances) via cast/forge | | `run-resources.sh` | Infrastructure snapshot (disk, RAM, API budget, CI queue) via shell commands | | `bootstrap-light.sh` | Lightweight Anvil bootstrap with contract deployment (used by red-team.sh) | | `promote-attacks.sh` | Deduplicate and PR novel attack vectors discovered by the red-team agent | | `export-attacks.py` | Extract cast send commands from agent stream log into `.jsonl` attack files | | `red-team-program.md` | System prompt for the adversarial Claude agent | | `holdout.config.ts` | Playwright config for holdout scenario execution | | `helpers/` | TypeScript helpers: RPC, assertions, swap, stake, floor, market, reporting, wallet | | `scenarios/` | Holdout scenario scripts and the passive-confidence suite | ## Exit Code Convention All evaluator scripts follow the same three-code contract: | Code | Meaning | |------|---------| | `0` | Success / gate passed | | `1` | Gate failed (scenario or attack found a problem) | | `2` | Infrastructure error (stack down, missing dependency, RPC unreachable) | Formulas and the orchestrator rely on these codes for routing — do not introduce additional exit codes without updating the formula TOML. ## Stack Lifecycle **Heavy formulas** (`run-holdout`, `run-red-team`, `run-evolution`) need a running Anvil or full Docker stack. Port 8545 is shared — these formulas are mutually exclusive and must not run concurrently. - `evaluate.sh` manages Docker compose (`harb-eval-{pr}` project) with full teardown via shell trap. - `red-team.sh` uses `bootstrap-light.sh` for a lightweight Anvil-only stack (no Docker). Cleanup is also trap-registered. - `run-protocol.sh` and `run-resources.sh` are lightweight — no Anvil, no Docker. ## Evidence Output Every script writes its evidence file to `evidence/{category}/{date}.json` conforming to the schema in `evidence/README.md`. The `deliver` step in each formula handles committing and posting an issue comment. ## Wallet Connection Helper `helpers/wallet.ts` — `connectWallet(page)` handles the Playwright wallet connection flow. Key behaviours: - Detects auto-reconnect: if wagmi already reconnected from storage (`.connect-button--connected` visible within 1 s), returns immediately. - Opens the connectors panel via `.connect-button--disconnected` (10 s timeout — wagmi needs time to settle into disconnected state after page load). - Falls back to mobile hamburger menu if desktop button not found. ## Adding a New Evaluator Script 1. Place the script in this directory. Use `#!/usr/bin/env bash` and `set -euo pipefail`. 2. Follow the exit code convention (0 / 1 / 2). 3. Accept configuration via environment variables, not positional args (except `evaluate.sh` which takes a PR number). 4. Write evidence to `evidence/{category}/{date}.json`. 5. Wire it into a formula TOML in `formulas/` — see [formulas/AGENTS.md](../../formulas/AGENTS.md) for the full walkthrough.