harb/scripts/harb-evaluator/AGENTS.md

<!-- last-reviewed: 358f719e2143ed0f99c738c61f1af9b544b03422 -->
# Agent Brief: harb-evaluator

The evaluator runtime executes formula-defined pipelines. Scripts in this
directory handle stack lifecycle, scenario execution, evidence collection,
and the adversarial agent harness.

## Directory Layout

| File | Purpose |
|------|---------|
| `evaluate.sh` | Holdout gate: worktree checkout → Docker stack → Playwright scenarios → teardown |
| `red-team.sh` | Adversarial agent runner: Anvil bootstrap → attack suite → Claude agent → evidence |
| `run-protocol.sh` | On-chain health snapshot (TVL, fees, positions, rebalances) via cast/forge |
| `run-resources.sh` | Infrastructure snapshot (disk, RAM, API budget, CI queue) via shell commands |
| `bootstrap-light.sh` | Lightweight Anvil bootstrap with contract deployment (used by red-team.sh) |
| `promote-attacks.sh` | Deduplicate and PR novel attack vectors discovered by the red-team agent |
| `export-attacks.py` | Extract cast send commands from agent stream log into `.jsonl` attack files |
| `red-team-program.md` | System prompt for the adversarial Claude agent |
| `holdout.config.ts` | Playwright config for holdout scenario execution |
| `helpers/` | TypeScript helpers: RPC, assertions, swap, stake, floor, market, reporting, wallet |
| `scenarios/` | Holdout scenario scripts and the passive-confidence suite |

## Exit Code Convention

All evaluator scripts follow the same three-code contract:

| Code | Meaning |
|------|---------|
| `0` | Success / gate passed |
| `1` | Gate failed (scenario or attack found a problem) |
| `2` | Infrastructure error (stack down, missing dependency, RPC unreachable) |

Formulas and the orchestrator rely on these codes for routing — do not
introduce additional exit codes without updating the formula TOML.

## Stack Lifecycle

**Heavy formulas** (`run-holdout`, `run-red-team`, `run-evolution`) need a running
Anvil or full Docker stack. Port 8545 is shared — these formulas are mutually
exclusive and must not run concurrently.

- `evaluate.sh` manages Docker compose (`harb-eval-{pr}` project) with full
  teardown via shell trap.
- `red-team.sh` uses `bootstrap-light.sh` for a lightweight Anvil-only stack
  (no Docker). Cleanup is also trap-registered.
- `run-protocol.sh` and `run-resources.sh` are lightweight — no Anvil, no Docker.

## Evidence Output

Every script writes its evidence file to `evidence/{category}/{date}.json`
conforming to the schema in `evidence/README.md`. The `deliver` step in each
formula handles committing and posting an issue comment.

## Wallet Connection Helper

`helpers/wallet.ts` — `connectWallet(page)` handles the Playwright wallet
connection flow. Key behaviours:
- Detects auto-reconnect: if wagmi already reconnected from storage
  (`.connect-button--connected` visible within 1 s), returns immediately.
- Opens the connectors panel via `.connect-button--disconnected` (10 s
  timeout — wagmi needs time to settle into disconnected state after page load).
- Falls back to mobile hamburger menu if desktop button not found.

## Adding a New Evaluator Script

1. Place the script in this directory. Use `#!/usr/bin/env bash` and `set -euo pipefail`.
2. Follow the exit code convention (0 / 1 / 2).
3. Accept configuration via environment variables, not positional args (except `evaluate.sh` which takes a PR number).
4. Write evidence to `evidence/{category}/{date}.json`.
5. Wire it into a formula TOML in `formulas/` — see [formulas/AGENTS.md](../../formulas/AGENTS.md) for the full walkthrough.
chore: gardener housekeeping 2026-03-25 AGENTS.md watermarks refreshed to HEAD (358f719). Content updates: - scripts/harb-evaluator/AGENTS.md: documented wallet.ts auto-reconnect fix (wagmi EIP-6963 auto-connect handling, 10s disconnect timeout) - All other AGENTS.md files: watermark bump only Pending actions (3): promote #1155 pitch-deck to backlog. Escalate: #1158 (Phase 1 completion accuracy) — needs planner/human decision. 2026-03-25 18:05:32 +00:00			`<!-- last-reviewed: 358f719e2143ed0f99c738c61f1af9b544b03422 -->`
fix: Formula AGENTS.md missing (#1079) Add formulas/AGENTS.md documenting sense vs act type distinction, cron conventions, step ID naming rules, TOML structure skeleton, and a how-to-add-a-new-formula walkthrough. Add scripts/harb-evaluator/AGENTS.md covering the evaluator runtime: directory layout, exit code convention, stack lifecycle, evidence output, and how to add a new evaluator script. Update root AGENTS.md directory map to link both new files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-03-23 00:38:56 +00:00			`# Agent Brief: harb-evaluator`

			`The evaluator runtime executes formula-defined pipelines. Scripts in this`
			`directory handle stack lifecycle, scenario execution, evidence collection,`
			`and the adversarial agent harness.`

			`## Directory Layout`

			`\| File \| Purpose \|`
			`\|------\|---------\|`
			\| `evaluate.sh` \| Holdout gate: worktree checkout → Docker stack → Playwright scenarios → teardown \|
			\| `red-team.sh` \| Adversarial agent runner: Anvil bootstrap → attack suite → Claude agent → evidence \|
			\| `run-protocol.sh` \| On-chain health snapshot (TVL, fees, positions, rebalances) via cast/forge \|
			\| `run-resources.sh` \| Infrastructure snapshot (disk, RAM, API budget, CI queue) via shell commands \|
			\| `bootstrap-light.sh` \| Lightweight Anvil bootstrap with contract deployment (used by red-team.sh) \|
			\| `promote-attacks.sh` \| Deduplicate and PR novel attack vectors discovered by the red-team agent \|
			\| `export-attacks.py` \| Extract cast send commands from agent stream log into `.jsonl` attack files \|
			\| `red-team-program.md` \| System prompt for the adversarial Claude agent \|
			\| `holdout.config.ts` \| Playwright config for holdout scenario execution \|
chore: gardener housekeeping 2026-03-25 AGENTS.md watermarks refreshed to HEAD (358f719). Content updates: - scripts/harb-evaluator/AGENTS.md: documented wallet.ts auto-reconnect fix (wagmi EIP-6963 auto-connect handling, 10s disconnect timeout) - All other AGENTS.md files: watermark bump only Pending actions (3): promote #1155 pitch-deck to backlog. Escalate: #1158 (Phase 1 completion accuracy) — needs planner/human decision. 2026-03-25 18:05:32 +00:00			\| `helpers/` \| TypeScript helpers: RPC, assertions, swap, stake, floor, market, reporting, wallet \|
fix: Formula AGENTS.md missing (#1079) Add formulas/AGENTS.md documenting sense vs act type distinction, cron conventions, step ID naming rules, TOML structure skeleton, and a how-to-add-a-new-formula walkthrough. Add scripts/harb-evaluator/AGENTS.md covering the evaluator runtime: directory layout, exit code convention, stack lifecycle, evidence output, and how to add a new evaluator script. Update root AGENTS.md directory map to link both new files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-03-23 00:38:56 +00:00			\| `scenarios/` \| Holdout scenario scripts and the passive-confidence suite \|

			`## Exit Code Convention`

			`All evaluator scripts follow the same three-code contract:`

			`\| Code \| Meaning \|`
			`\|------\|---------\|`
			\| `0` \| Success / gate passed \|
			\| `1` \| Gate failed (scenario or attack found a problem) \|
			\| `2` \| Infrastructure error (stack down, missing dependency, RPC unreachable) \|

			`Formulas and the orchestrator rely on these codes for routing — do not`
			`introduce additional exit codes without updating the formula TOML.`

			`## Stack Lifecycle`

			Heavy formulas (`run-holdout`, `run-red-team`, `run-evolution`) need a running
			`Anvil or full Docker stack. Port 8545 is shared — these formulas are mutually`
			`exclusive and must not run concurrently.`

			- `evaluate.sh` manages Docker compose (`harb-eval-{pr}` project) with full
			`teardown via shell trap.`
			- `red-team.sh` uses `bootstrap-light.sh` for a lightweight Anvil-only stack
			`(no Docker). Cleanup is also trap-registered.`
			- `run-protocol.sh` and `run-resources.sh` are lightweight — no Anvil, no Docker.

			`## Evidence Output`

			Every script writes its evidence file to `evidence/{category}/{date}.json`
			conforming to the schema in `evidence/README.md`. The `deliver` step in each
			`formula handles committing and posting an issue comment.`

chore: gardener housekeeping 2026-03-25 AGENTS.md watermarks refreshed to HEAD (358f719). Content updates: - scripts/harb-evaluator/AGENTS.md: documented wallet.ts auto-reconnect fix (wagmi EIP-6963 auto-connect handling, 10s disconnect timeout) - All other AGENTS.md files: watermark bump only Pending actions (3): promote #1155 pitch-deck to backlog. Escalate: #1158 (Phase 1 completion accuracy) — needs planner/human decision. 2026-03-25 18:05:32 +00:00			`## Wallet Connection Helper`

			`helpers/wallet.ts` — `connectWallet(page)` handles the Playwright wallet
			`connection flow. Key behaviours:`
			`- Detects auto-reconnect: if wagmi already reconnected from storage`
			(`.connect-button--connected` visible within 1 s), returns immediately.
			- Opens the connectors panel via `.connect-button--disconnected` (10 s
			`timeout — wagmi needs time to settle into disconnected state after page load).`
			`- Falls back to mobile hamburger menu if desktop button not found.`

fix: Formula AGENTS.md missing (#1079) Add formulas/AGENTS.md documenting sense vs act type distinction, cron conventions, step ID naming rules, TOML structure skeleton, and a how-to-add-a-new-formula walkthrough. Add scripts/harb-evaluator/AGENTS.md covering the evaluator runtime: directory layout, exit code convention, stack lifecycle, evidence output, and how to add a new evaluator script. Update root AGENTS.md directory map to link both new files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-03-23 00:38:56 +00:00			`## Adding a New Evaluator Script`

			1. Place the script in this directory. Use `#!/usr/bin/env bash` and `set -euo pipefail`.
			`2. Follow the exit code convention (0 / 1 / 2).`
			3. Accept configuration via environment variables, not positional args (except `evaluate.sh` which takes a PR number).
			4. Write evidence to `evidence/{category}/{date}.json`.
			5. Wire it into a formula TOML in `formulas/` — see [formulas/AGENTS.md](../../formulas/AGENTS.md) for the full walkthrough.