Commit graph

11 commits

Author SHA1 Message Date
johba
36cda487e6 fix: forward attack_dir input to red-team.sh invocation in formula
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 15:54:41 +00:00
johba
52ba6b2f38 fix: run-attack-suite is spec-only — no implementation in red-team.sh (#1000)
Implement the attack catalogue loop (step 5a) in red-team.sh that was
previously a forward spec in the formula. The loop replays every *.jsonl
attack file through AttackRunner.s.sol with snapshot revert between files,
records LM total ETH before/after each attack, and injects results into
the adversarial agent prompt so it knows which strategies are already
catalogued.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 15:30:46 +00:00
johba
349bd2c2c6 fix: bootstrap-light.sh lacks Push3 candidate injection (#999)
Add CANDIDATE env var support to bootstrap-light.sh. When set to a
.push3 file path, the script:
1. Invokes push3-transpiler to regenerate OptimizerV3Push3.sol
2. Extracts the function body into OptimizerV3Push3Lib.sol
3. Deploys contracts normally via DeployLocal.sol
4. Deploys OptimizerV3 and upgrades the UUPS proxy via upgradeTo()

Also updates formulas/run-red-team.toml to reflect the implementation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 13:19:48 +00:00
johba
0edda8bb4b fix: evolution formula must commit results via PR before closing (#1047)
- Add `cleanup` step: removes per-generation candidate files and
  generation_*.jsonl records after they are aggregated into the evidence
  file, preventing disk exhaustion (cf. run #1025 at 91% usage).

- Rewrite `deliver` step with mandatory ordering:
  1. `git checkout -- .` to discard unrelated working-tree modifications
     before staging result files (evidence JSON, champion .push3, manifest).
  2. Commit to branch `evidence/evolution-run-{run_id}` (not directly to main).
  3. Push and create PR — if this fails, post an error comment and leave the
     issue OPEN; do not proceed to step 4.
  4. Post summary comment only after PR URL is confirmed, with mandatory
     link to the PR.

- Update `products.evidence_file` delivery to PR branch (was "commit to main").
- Update `products.issue_comment` to enforce ordering and non-close-on-failure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 00:16:53 +00:00
johba
b616953313 fix: add missing shell scripts and fix contract interface in run-protocol
- Add scripts/harb-evaluator/run-resources.sh: collects disk, RAM,
  Anthropic API usage, and Woodpecker CI queue metrics
- Add scripts/harb-evaluator/run-protocol.sh: collects TVL, fees,
  position data, and rebalance events from LiquidityManager
- Fix run-protocol.toml: positions accessed via positions(uint8) not
  named getters (floorPosition/anchorPosition/discoveryPosition)
- Fix event signature: Recentered(int24,bool) not Recenter(int24,int24,int24)

Addresses review findings: missing implementation files and contract
interface mismatch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 21:00:14 +00:00
johba
de014e9b13 fix: feat: implement evidence/resources and evidence/protocol logging (#1059)
- Add evidence/resources/ and evidence/protocol/ directories with .gitkeep
- Add schemas for resources/ and protocol/ to evidence/README.md
- Create formulas/run-resources.toml (sense formula: disk/RAM/API/CI metrics,
  daily cron 06:00 UTC, verdict: ok/warn/critical)
- Create formulas/run-protocol.toml (sense formula: TVL/fees/positions/
  rebalance frequency via LmTotalEth.s.sol + cast, daily cron 07:00 UTC,
  verdict: healthy/degraded/offline)
- Update STATE.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 19:39:23 +00:00
openhands
708a00a2f4 fix: Formula: run-evolution (optimizer pipeline) (#975)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19 11:25:01 +00:00
openhands
152f6e0a40 fix: Formula: run-red-team (adversarial attack + discovery) (#976)
Address review feedback:
- Remove candidate input (Push3 transpilation not wired; documented in
  notes.candidate_injection as planned follow-up)
- Mark run-attack-suite step as status="planned" with run_attack_suite_gap note
- Update execution.invocation to only pass env vars red-team.sh actually reads
- Fix export-vectors args to include --eth-extracted and --eth-before flags
- Clarify export-vectors only runs when floor_broken (BROKE=true)
- Document tmp/red-team-snapshots.jsonl (AttackRunner replay side output)
- Add comment that {attack_type} in products.attack_vectors.path is
  runtime-computed by promote-attacks.sh, not a formula input
- Fix schema comment notation (§ → ##)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19 12:04:56 +01:00
openhands
3564c4ad25 fix: Formula: run-red-team (adversarial attack + discovery) (#976)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19 12:04:56 +01:00
openhands
d278954b44 fix: Formula: run-holdout (PR quality gate) (#977)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19 10:58:15 +01:00
openhands
27f841927e fix: Formula: run-user-test (persona UX evaluation) (#978)
Add formulas/run-user-test.toml — a sense-only process definition for
persona-based UX evaluation. Defines 5 personas across 2 funnels
(passive-holder: tyler/alex/sarah; staker: priya/marcus), full stack
lifecycle (start → run → collect → stop → deliver), and the three
standard evidence delivery products (evidence JSON committed to main,
screenshots referenced in evidence, summary as issue comment).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19 09:10:14 +00:00