johba/harb

johba 2ef2e48f8a chore: gardener housekeeping 2026-03-23

AGENTS.md watermarks refreshed to HEAD (209e0c7). Key content updates:
- root AGENTS.md: added packages/analytics/ to directory map
- landing/AGENTS.md: documented @harb/analytics integration and Umami funnel tracking
- web-app/AGENTS.md: documented analytics events (wallet_connect, swap_initiated, stake_created)
- onchain/AGENTS.md: documented AttackRunner fixes (taxRate as index, vm.warp, same-broadcast recenter), 2000-trade floor-ratchet evidence

Pending actions (6): promote #1083 and #1086 to backlog, unblock #1099.

2026-03-23 18:07:12 +00:00

3 KiB

Raw Blame History

Agent Brief: harb-evaluator

The evaluator runtime executes formula-defined pipelines. Scripts in this directory handle stack lifecycle, scenario execution, evidence collection, and the adversarial agent harness.

Directory Layout

File	Purpose
`evaluate.sh`	Holdout gate: worktree checkout → Docker stack → Playwright scenarios → teardown
`red-team.sh`	Adversarial agent runner: Anvil bootstrap → attack suite → Claude agent → evidence
`run-protocol.sh`	On-chain health snapshot (TVL, fees, positions, rebalances) via cast/forge
`run-resources.sh`	Infrastructure snapshot (disk, RAM, API budget, CI queue) via shell commands
`bootstrap-light.sh`	Lightweight Anvil bootstrap with contract deployment (used by red-team.sh)
`promote-attacks.sh`	Deduplicate and PR novel attack vectors discovered by the red-team agent
`export-attacks.py`	Extract cast send commands from agent stream log into `.jsonl` attack files
`red-team-program.md`	System prompt for the adversarial Claude agent
`holdout.config.ts`	Playwright config for holdout scenario execution
`helpers/`	TypeScript helpers: RPC, assertions, swap, stake, floor, market, reporting
`scenarios/`	Holdout scenario scripts and the passive-confidence suite

Exit Code Convention

All evaluator scripts follow the same three-code contract:

Code	Meaning
`0`	Success / gate passed
`1`	Gate failed (scenario or attack found a problem)
`2`	Infrastructure error (stack down, missing dependency, RPC unreachable)

Formulas and the orchestrator rely on these codes for routing — do not introduce additional exit codes without updating the formula TOML.

Stack Lifecycle

Heavy formulas (run-holdout, run-red-team, run-evolution) need a running Anvil or full Docker stack. Port 8545 is shared — these formulas are mutually exclusive and must not run concurrently.

evaluate.sh manages Docker compose (harb-eval-{pr} project) with full teardown via shell trap.
red-team.sh uses bootstrap-light.sh for a lightweight Anvil-only stack (no Docker). Cleanup is also trap-registered.
run-protocol.sh and run-resources.sh are lightweight — no Anvil, no Docker.

Evidence Output

Every script writes its evidence file to evidence/{category}/{date}.json conforming to the schema in evidence/README.md. The deliver step in each formula handles committing and posting an issue comment.

Adding a New Evaluator Script

Place the script in this directory. Use #!/usr/bin/env bash and set -euo pipefail.
Follow the exit code convention (0 / 1 / 2).
Accept configuration via environment variables, not positional args (except evaluate.sh which takes a PR number).
Write evidence to evidence/{category}/{date}.json.
Wire it into a formula TOML in formulas/ — see formulas/AGENTS.md for the full walkthrough.

3 KiB Raw Blame History