harb/scripts/harb-evaluator/AGENTS.md

<!-- last-reviewed: 358f719e2143ed0f99c738c61f1af9b544b03422 -->
# Agent Brief: harb-evaluator

The evaluator runtime executes formula-defined pipelines. Scripts in this
directory handle stack lifecycle, scenario execution, evidence collection,
and the adversarial agent harness.

## Directory Layout

| File | Purpose |
|------|---------|
| `evaluate.sh` | Holdout gate: worktree checkout → Docker stack → Playwright scenarios → teardown |
| `red-team.sh` | Adversarial agent runner: Anvil bootstrap → attack suite → Claude agent → evidence |
| `run-protocol.sh` | On-chain health snapshot (TVL, fees, positions, rebalances) via cast/forge |
| `run-resources.sh` | Infrastructure snapshot (disk, RAM, API budget, CI queue) via shell commands |
| `bootstrap-light.sh` | Lightweight Anvil bootstrap with contract deployment (used by red-team.sh) |
| `promote-attacks.sh` | Deduplicate and PR novel attack vectors discovered by the red-team agent |
| `export-attacks.py` | Extract cast send commands from agent stream log into `.jsonl` attack files |
| `red-team-program.md` | System prompt for the adversarial Claude agent |
| `holdout.config.ts` | Playwright config for holdout scenario execution |
| `helpers/` | TypeScript helpers: RPC, assertions, swap, stake, floor, market, reporting, wallet |
| `scenarios/` | Holdout scenario scripts and the passive-confidence suite |

## Exit Code Convention

All evaluator scripts follow the same three-code contract:

| Code | Meaning |
|------|---------|
| `0` | Success / gate passed |
| `1` | Gate failed (scenario or attack found a problem) |
| `2` | Infrastructure error (stack down, missing dependency, RPC unreachable) |

Formulas and the orchestrator rely on these codes for routing — do not
introduce additional exit codes without updating the formula TOML.

## Stack Lifecycle

**Heavy formulas** (`run-holdout`, `run-red-team`, `run-evolution`) need a running
Anvil or full Docker stack. Port 8545 is shared — these formulas are mutually
exclusive and must not run concurrently.

- `evaluate.sh` manages Docker compose (`harb-eval-{pr}` project) with full
  teardown via shell trap.
- `red-team.sh` uses `bootstrap-light.sh` for a lightweight Anvil-only stack
  (no Docker). Cleanup is also trap-registered.
- `run-protocol.sh` and `run-resources.sh` are lightweight — no Anvil, no Docker.

## Evidence Output

Every script writes its evidence file to `evidence/{category}/{date}.json`
conforming to the schema in `evidence/README.md`. The `deliver` step in each
formula handles committing and posting an issue comment.

## Wallet Connection Helper

`helpers/wallet.ts` — `connectWallet(page)` handles the Playwright wallet
connection flow. Key behaviours:
- Detects auto-reconnect: if wagmi already reconnected from storage
  (`.connect-button--connected` visible within 1 s), returns immediately.
- Opens the connectors panel via `.connect-button--disconnected` (10 s
  timeout — wagmi needs time to settle into disconnected state after page load).
- Falls back to mobile hamburger menu if desktop button not found.

## Adding a New Evaluator Script

1. Place the script in this directory. Use `#!/usr/bin/env bash` and `set -euo pipefail`.
2. Follow the exit code convention (0 / 1 / 2).
3. Accept configuration via environment variables, not positional args (except `evaluate.sh` which takes a PR number).
4. Write evidence to `evidence/{category}/{date}.json`.
5. Wire it into a formula TOML in `formulas/` — see [formulas/AGENTS.md](../../formulas/AGENTS.md) for the full walkthrough.