harb/scripts/harb-evaluator/AGENTS.md
johba fce4b8b068 chore: gardener housekeeping 2026-03-25
AGENTS.md watermarks refreshed to HEAD (358f719). Content updates:
- scripts/harb-evaluator/AGENTS.md: documented wallet.ts auto-reconnect
  fix (wagmi EIP-6963 auto-connect handling, 10s disconnect timeout)
- All other AGENTS.md files: watermark bump only

Pending actions (3): promote #1155 pitch-deck to backlog.

Escalate: #1158 (Phase 1 completion accuracy) — needs planner/human decision.
2026-03-25 18:05:32 +00:00

71 lines
3.5 KiB
Markdown

<!-- last-reviewed: 358f719e2143ed0f99c738c61f1af9b544b03422 -->
# Agent Brief: harb-evaluator
The evaluator runtime executes formula-defined pipelines. Scripts in this
directory handle stack lifecycle, scenario execution, evidence collection,
and the adversarial agent harness.
## Directory Layout
| File | Purpose |
|------|---------|
| `evaluate.sh` | Holdout gate: worktree checkout → Docker stack → Playwright scenarios → teardown |
| `red-team.sh` | Adversarial agent runner: Anvil bootstrap → attack suite → Claude agent → evidence |
| `run-protocol.sh` | On-chain health snapshot (TVL, fees, positions, rebalances) via cast/forge |
| `run-resources.sh` | Infrastructure snapshot (disk, RAM, API budget, CI queue) via shell commands |
| `bootstrap-light.sh` | Lightweight Anvil bootstrap with contract deployment (used by red-team.sh) |
| `promote-attacks.sh` | Deduplicate and PR novel attack vectors discovered by the red-team agent |
| `export-attacks.py` | Extract cast send commands from agent stream log into `.jsonl` attack files |
| `red-team-program.md` | System prompt for the adversarial Claude agent |
| `holdout.config.ts` | Playwright config for holdout scenario execution |
| `helpers/` | TypeScript helpers: RPC, assertions, swap, stake, floor, market, reporting, wallet |
| `scenarios/` | Holdout scenario scripts and the passive-confidence suite |
## Exit Code Convention
All evaluator scripts follow the same three-code contract:
| Code | Meaning |
|------|---------|
| `0` | Success / gate passed |
| `1` | Gate failed (scenario or attack found a problem) |
| `2` | Infrastructure error (stack down, missing dependency, RPC unreachable) |
Formulas and the orchestrator rely on these codes for routing — do not
introduce additional exit codes without updating the formula TOML.
## Stack Lifecycle
**Heavy formulas** (`run-holdout`, `run-red-team`, `run-evolution`) need a running
Anvil or full Docker stack. Port 8545 is shared — these formulas are mutually
exclusive and must not run concurrently.
- `evaluate.sh` manages Docker compose (`harb-eval-{pr}` project) with full
teardown via shell trap.
- `red-team.sh` uses `bootstrap-light.sh` for a lightweight Anvil-only stack
(no Docker). Cleanup is also trap-registered.
- `run-protocol.sh` and `run-resources.sh` are lightweight — no Anvil, no Docker.
## Evidence Output
Every script writes its evidence file to `evidence/{category}/{date}.json`
conforming to the schema in `evidence/README.md`. The `deliver` step in each
formula handles committing and posting an issue comment.
## Wallet Connection Helper
`helpers/wallet.ts``connectWallet(page)` handles the Playwright wallet
connection flow. Key behaviours:
- Detects auto-reconnect: if wagmi already reconnected from storage
(`.connect-button--connected` visible within 1 s), returns immediately.
- Opens the connectors panel via `.connect-button--disconnected` (10 s
timeout — wagmi needs time to settle into disconnected state after page load).
- Falls back to mobile hamburger menu if desktop button not found.
## Adding a New Evaluator Script
1. Place the script in this directory. Use `#!/usr/bin/env bash` and `set -euo pipefail`.
2. Follow the exit code convention (0 / 1 / 2).
3. Accept configuration via environment variables, not positional args (except `evaluate.sh` which takes a PR number).
4. Write evidence to `evidence/{category}/{date}.json`.
5. Wire it into a formula TOML in `formulas/` — see [formulas/AGENTS.md](../../formulas/AGENTS.md) for the full walkthrough.