- Updated holdout.config.ts to use HOLDOUT_SCENARIOS_DIR env var
- Modified evaluate.sh to clone harb-holdout-scenarios repo at runtime
- Deleted scripts/harb-evaluator/scenarios/ directory
- Added .holdout-scenarios/ to .gitignore
- Holdout scenarios are now cloned into .holdout-scenarios/ during evaluation
- This prevents dev-agent from seeing the holdout test set
- evaluate.sh: add --ignore-scripts to npm install (prevents husky from
writing to permanent repo .git/hooks from the ephemeral worktree)
- evaluate.sh: change --silent to --quiet (errors still printed on failure)
- evaluate.sh: add `npx playwright install chromium` step so browser
binaries are present even when the cached revision doesn't match ^1.55.1
- evaluate.sh: set CI=true inline on the playwright invocation so
forbidOnly activates and accidental test.only() causes a gate failure
- holdout.config.ts: document that CI=true is supplied by evaluate.sh
- always-leave.spec.ts: add waitForReceipt() helper; replace fixed
waitForTimeout(2000) after eth_sendTransaction with proper receipt
polling so tx confirmation is not a timing assumption
- always-leave.spec.ts: log the caught error in the button-cycling
try/catch so contract reverts surface in the output
- always-leave.spec.ts: add console.log when connect button or connector
panel is not found to make silent-skip cases diagnosable
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace shell-script scenario runner with Playwright. The evaluator now
runs `npx playwright test --config scripts/harb-evaluator/holdout.config.ts`
after booting the stack, using the existing tests/setup/ wallet-provider
and navigation infrastructure.
Changes:
- scripts/harb-evaluator/holdout.config.ts — new Playwright config pointing
to scenarios/, headless chromium, 5-min timeout per test
- scripts/harb-evaluator/scenarios/sovereign-exit/always-leave.spec.ts —
Playwright spec that buys KRK through the LocalSwapWidget then sells it
back via the injected wallet provider, asserting sovereign exit works
- scripts/harb-evaluator/evaluate.sh — adds root npm install step (needed
for npx playwright), exports STACK_* env aliases for getStackConfig(),
replaces shell-script loop with a single playwright test invocation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>