AGENTS.md watermarks refreshed to HEAD (209e0c7). Key content updates:
- root AGENTS.md: added packages/analytics/ to directory map
- landing/AGENTS.md: documented @harb/analytics integration and Umami funnel tracking
- web-app/AGENTS.md: documented analytics events (wallet_connect, swap_initiated, stake_created)
- onchain/AGENTS.md: documented AttackRunner fixes (taxRate as index, vm.warp, same-broadcast recenter), 2000-trade floor-ratchet evidence
Pending actions (6): promote #1083 and #1086 to backlog, unblock #1099.
3 KiB
Agent Brief: harb-evaluator
The evaluator runtime executes formula-defined pipelines. Scripts in this directory handle stack lifecycle, scenario execution, evidence collection, and the adversarial agent harness.
Directory Layout
| File | Purpose |
|---|---|
evaluate.sh |
Holdout gate: worktree checkout → Docker stack → Playwright scenarios → teardown |
red-team.sh |
Adversarial agent runner: Anvil bootstrap → attack suite → Claude agent → evidence |
run-protocol.sh |
On-chain health snapshot (TVL, fees, positions, rebalances) via cast/forge |
run-resources.sh |
Infrastructure snapshot (disk, RAM, API budget, CI queue) via shell commands |
bootstrap-light.sh |
Lightweight Anvil bootstrap with contract deployment (used by red-team.sh) |
promote-attacks.sh |
Deduplicate and PR novel attack vectors discovered by the red-team agent |
export-attacks.py |
Extract cast send commands from agent stream log into .jsonl attack files |
red-team-program.md |
System prompt for the adversarial Claude agent |
holdout.config.ts |
Playwright config for holdout scenario execution |
helpers/ |
TypeScript helpers: RPC, assertions, swap, stake, floor, market, reporting |
scenarios/ |
Holdout scenario scripts and the passive-confidence suite |
Exit Code Convention
All evaluator scripts follow the same three-code contract:
| Code | Meaning |
|---|---|
0 |
Success / gate passed |
1 |
Gate failed (scenario or attack found a problem) |
2 |
Infrastructure error (stack down, missing dependency, RPC unreachable) |
Formulas and the orchestrator rely on these codes for routing — do not introduce additional exit codes without updating the formula TOML.
Stack Lifecycle
Heavy formulas (run-holdout, run-red-team, run-evolution) need a running
Anvil or full Docker stack. Port 8545 is shared — these formulas are mutually
exclusive and must not run concurrently.
evaluate.shmanages Docker compose (harb-eval-{pr}project) with full teardown via shell trap.red-team.shusesbootstrap-light.shfor a lightweight Anvil-only stack (no Docker). Cleanup is also trap-registered.run-protocol.shandrun-resources.share lightweight — no Anvil, no Docker.
Evidence Output
Every script writes its evidence file to evidence/{category}/{date}.json
conforming to the schema in evidence/README.md. The deliver step in each
formula handles committing and posting an issue comment.
Adding a New Evaluator Script
- Place the script in this directory. Use
#!/usr/bin/env bashandset -euo pipefail. - Follow the exit code convention (0 / 1 / 2).
- Accept configuration via environment variables, not positional args (except
evaluate.shwhich takes a PR number). - Write evidence to
evidence/{category}/{date}.json. - Wire it into a formula TOML in
formulas/— see formulas/AGENTS.md for the full walkthrough.