2026-03-25 18:05:32 +00:00
|
|
|
<!-- last-reviewed: 358f719e2143ed0f99c738c61f1af9b544b03422 -->
|
2026-03-23 00:38:56 +00:00
|
|
|
# Agent Brief: harb-evaluator
|
|
|
|
|
|
|
|
|
|
The evaluator runtime executes formula-defined pipelines. Scripts in this
|
|
|
|
|
directory handle stack lifecycle, scenario execution, evidence collection,
|
|
|
|
|
and the adversarial agent harness.
|
|
|
|
|
|
|
|
|
|
## Directory Layout
|
|
|
|
|
|
|
|
|
|
| File | Purpose |
|
|
|
|
|
|------|---------|
|
|
|
|
|
| `evaluate.sh` | Holdout gate: worktree checkout → Docker stack → Playwright scenarios → teardown |
|
|
|
|
|
| `red-team.sh` | Adversarial agent runner: Anvil bootstrap → attack suite → Claude agent → evidence |
|
|
|
|
|
| `run-protocol.sh` | On-chain health snapshot (TVL, fees, positions, rebalances) via cast/forge |
|
|
|
|
|
| `run-resources.sh` | Infrastructure snapshot (disk, RAM, API budget, CI queue) via shell commands |
|
|
|
|
|
| `bootstrap-light.sh` | Lightweight Anvil bootstrap with contract deployment (used by red-team.sh) |
|
|
|
|
|
| `promote-attacks.sh` | Deduplicate and PR novel attack vectors discovered by the red-team agent |
|
|
|
|
|
| `export-attacks.py` | Extract cast send commands from agent stream log into `.jsonl` attack files |
|
|
|
|
|
| `red-team-program.md` | System prompt for the adversarial Claude agent |
|
|
|
|
|
| `holdout.config.ts` | Playwright config for holdout scenario execution |
|
2026-03-25 18:05:32 +00:00
|
|
|
| `helpers/` | TypeScript helpers: RPC, assertions, swap, stake, floor, market, reporting, wallet |
|
2026-03-23 00:38:56 +00:00
|
|
|
| `scenarios/` | Holdout scenario scripts and the passive-confidence suite |
|
|
|
|
|
|
|
|
|
|
## Exit Code Convention
|
|
|
|
|
|
|
|
|
|
All evaluator scripts follow the same three-code contract:
|
|
|
|
|
|
|
|
|
|
| Code | Meaning |
|
|
|
|
|
|------|---------|
|
|
|
|
|
| `0` | Success / gate passed |
|
|
|
|
|
| `1` | Gate failed (scenario or attack found a problem) |
|
|
|
|
|
| `2` | Infrastructure error (stack down, missing dependency, RPC unreachable) |
|
|
|
|
|
|
|
|
|
|
Formulas and the orchestrator rely on these codes for routing — do not
|
|
|
|
|
introduce additional exit codes without updating the formula TOML.
|
|
|
|
|
|
|
|
|
|
## Stack Lifecycle
|
|
|
|
|
|
|
|
|
|
**Heavy formulas** (`run-holdout`, `run-red-team`, `run-evolution`) need a running
|
|
|
|
|
Anvil or full Docker stack. Port 8545 is shared — these formulas are mutually
|
|
|
|
|
exclusive and must not run concurrently.
|
|
|
|
|
|
|
|
|
|
- `evaluate.sh` manages Docker compose (`harb-eval-{pr}` project) with full
|
|
|
|
|
teardown via shell trap.
|
|
|
|
|
- `red-team.sh` uses `bootstrap-light.sh` for a lightweight Anvil-only stack
|
|
|
|
|
(no Docker). Cleanup is also trap-registered.
|
|
|
|
|
- `run-protocol.sh` and `run-resources.sh` are lightweight — no Anvil, no Docker.
|
|
|
|
|
|
|
|
|
|
## Evidence Output
|
|
|
|
|
|
|
|
|
|
Every script writes its evidence file to `evidence/{category}/{date}.json`
|
|
|
|
|
conforming to the schema in `evidence/README.md`. The `deliver` step in each
|
|
|
|
|
formula handles committing and posting an issue comment.
|
|
|
|
|
|
2026-03-25 18:05:32 +00:00
|
|
|
## Wallet Connection Helper
|
|
|
|
|
|
|
|
|
|
`helpers/wallet.ts` — `connectWallet(page)` handles the Playwright wallet
|
|
|
|
|
connection flow. Key behaviours:
|
|
|
|
|
- Detects auto-reconnect: if wagmi already reconnected from storage
|
|
|
|
|
(`.connect-button--connected` visible within 1 s), returns immediately.
|
|
|
|
|
- Opens the connectors panel via `.connect-button--disconnected` (10 s
|
|
|
|
|
timeout — wagmi needs time to settle into disconnected state after page load).
|
|
|
|
|
- Falls back to mobile hamburger menu if desktop button not found.
|
|
|
|
|
|
2026-03-23 00:38:56 +00:00
|
|
|
## Adding a New Evaluator Script
|
|
|
|
|
|
|
|
|
|
1. Place the script in this directory. Use `#!/usr/bin/env bash` and `set -euo pipefail`.
|
|
|
|
|
2. Follow the exit code convention (0 / 1 / 2).
|
|
|
|
|
3. Accept configuration via environment variables, not positional args (except `evaluate.sh` which takes a PR number).
|
|
|
|
|
4. Write evidence to `evidence/{category}/{date}.json`.
|
|
|
|
|
5. Wire it into a formula TOML in `formulas/` — see [formulas/AGENTS.md](../../formulas/AGENTS.md) for the full walkthrough.
|