Commit graph

92 commits

Author SHA1 Message Date
openhands
33123cfd1d fix: evaluate.sh detects docker compose vs docker-compose binary; red-team-sweep.sh does not (#964)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 18:57:36 +00:00
openhands
044f8d41f8 fix: EXIT trap omits container teardown on script interruption (#862) 2026-03-18 13:37:23 +00:00
openhands
13f406b5a9 fix: red-team.sh and AttackRunner.s.sol still use Base mainnet addresses (#939)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 07:33:54 +00:00
openhands
04388538b3 fix: bootstrap-light.sh missing validation guards for KRK, STAKE, OPT (#872)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-17 23:37:01 +00:00
openhands
53b3b995a6 fix: also update red-team.sh addresses to Base Sepolia (#873)
red-team.sh produces the stream JSONL that export-attacks.py parses, so
they must agree on addresses. Update SWAP_ROUTER and NPM in red-team.sh
to Base Sepolia and fix the invariant comment in export-attacks.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-17 23:05:26 +00:00
openhands
99b3a8bc80 fix: export-attacks.py SWAP_ROUTER_ADDR inconsistent with helpers/ (#873)
Update SWAP_ROUTER_ADDR and NPM_ADDR in export-attacks.py from Base
mainnet addresses to the correct Base Sepolia addresses, matching
helpers/market.ts and helpers/swap.ts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-17 22:39:25 +00:00
openhands
db76d648de fix: Bare \cd\ at line 293 in main loop (pre-existing) (#927)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 18:47:14 +00:00
johba
415e94243e Merge pull request 'fix: MEMORY_FILE trim may discard DECREASED entries before 4c extraction (#875)' (#926) from fix/issue-875 into master 2026-03-17 19:27:09 +01:00
openhands
8b3fd340ac fix: MEMORY_FILE trim may discard DECREASED entries before 4c extraction (#875)
Address AI reviewer feedback on d1f75a7:

- Wrap cross_file append in try/except so a write failure never prevents
  the memory trim-write from running (bug fix)
- Stamp sweep_id on pre-trim exported entries using the SWEEP_ID env var;
  pass SWEEP_ID from red-team-sweep.sh so entries are attributable to a
  sweep run (data-consistency fix)
- Add inline comment explaining the 3-tuple dedup key (run, ts, strategy)
  and its relationship to step-4c's identity check (clarity nit)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 17:58:41 +00:00
openhands
d1f75a790c fix: MEMORY_FILE trim may discard DECREASED entries before 4c extraction (#875)
Before trimming MEMORY_FILE to 50 entries, export any entries that would
be dropped (non-DECREASED entries outside the last 10) directly to
CROSS_PATTERNS_FILE. This ensures no entries are permanently lost before
red-team-sweep.sh step 4c reads the memory file.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 17:30:47 +00:00
openhands
315b7777f8 fix: Bare \cd\ in smoke test permanently changes shell working directory (#877)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 17:07:17 +00:00
openhands
e2554eb844 fix: \sleep 1\ polling loop violates AGENTS.md 'never use fixed delays' principle (#878)
Replace fixed \`sleep 1\` in the container teardown poll loop with exponential
backoff (100ms → 200ms → … → 2000ms cap). The 30s hard timeout is preserved.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 16:07:58 +00:00
openhands
07b117c906 fix: \compute_lm_total_eth\ awk parser reads only the line immediately after \== Logs ==\ (#879)
Replace getline-once approach with a forward-scan that skips blank lines
and warning lines after the marker, finding the first digit-only line.
2026-03-17 15:07:30 +00:00
openhands
f3fb1c3db0 fix: fix: red-team cross-pattern export records intermediate states as DECREASED (#852)
The extract_memory regex previously matched any "lm.?eth" mention,
including mid-execution "Total LM ETH: X wei" output lines produced by
the agent's cast check commands.  During a staking step these lines
reflect an intermediate chain state (ETH temporarily locked/moved)
rather than the final reverted state, causing strategies to be recorded
as DECREASED even when the runner confirmed ETH_SAFE.

Fix: narrow the capture to the structured `lm_eth_after: <value>`
label that the agent writes in its final RED-TEAM REPORT block.
Mid-execution total-ETH lines no longer match and cannot corrupt the
per-strategy result in memory or the cross-patterns file.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 13:19:03 +00:00
openhands
fe3a3d7d94 fix: feat: persist red-team cross-patterns in repo for continuity across runs (#853)
- Move CROSS_PATTERNS_FILE from /tmp/red-team-cross-patterns.jsonl to
  tools/red-team/cross-patterns.jsonl (repo-tracked path)
- Remove the reset (> file) at sweep start so patterns accumulate across runs
- Generate a SWEEP_ID (sweep-YYYYMMDD-HHMMSS) at sweep start and stamp
  each new entry with sweep_id for traceability
- Deduplicate on (pattern, candidate, result): entries already present in
  the file are skipped; intra-batch duplicates are also suppressed
- Create tools/red-team/ directory with .gitkeep
- Add mkdir -p guards in both scripts so the directory is created on first run

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 12:39:39 +00:00
openhands
a2f89968db fix: fix: red-team.sh V3_FACTORY hardcodes Base mainnet address instead of Sepolia (#854)
bootstrap-light.sh now extracts the Uniswap V3 pool address from
DeployLocal.sol deploy output and writes both Pool and V3Factory
(Base Sepolia: 0x4752ba5DBc23f44D87826276BF6Fd6b1C372aD24) into
deployments-local.json alongside the existing contract addresses.

red-team.sh now reads V3_FACTORY and POOL from deployments-local.json
instead of hardcoding the Base mainnet factory address
(0x33128a8fC17869897dcE68Ed026d694621f6FDfD), and removes the getPool()
RPC call that always failed with "contract does not have any code" on
the Sepolia fork.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 12:02:17 +00:00
openhands
91e4bdf926 fix: red-team-program.md taxRate naming inconsistency (pre-existing) (#835) 2026-03-16 09:46:55 +00:00
openhands
cb305b8c81 fix: MEMORY_FILE parent directory ($REPO_ROOT/tmp/) also not guaranteed to exist (#844)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 07:57:20 +00:00
openhands
8986154d8f fix: sleep 5 at teardown violates AGENTS.md engineering principles (#845) 2026-03-16 07:06:57 +00:00
openhands
ac2fa16e2e fix: ATTACKS_OUT directory not guaranteed to exist (#816) 2026-03-15 22:36:51 +00:00
openhands
ae3eb14833 fix: address review findings for sweep-results.tsv (#818)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 20:48:33 +00:00
openhands
3c6be7d86f fix: feat: structured sweep-results.tsv for red-team sweep (#818)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 20:20:13 +00:00
openhands
0d09f598d9 fix: Hardcoded TWAP/cooldown values not documented (#825)
Document MIN_RECENTER_INTERVAL (60 s, LiquidityManager.sol:61) and
PRICE_STABILITY_INTERVAL (300 s, PriceOracle.sol:14) in
docs/ARCHITECTURE.md and docs/PRODUCT-TRUTH.md so that agent-facing
and product-facing copy stays traceable to source constants.

Add an inline HTML comment in red-team-program.md next to the
hardcoded 60s/300s sentence pointing to the two source constants,
making drift detectable during code review.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 19:51:52 +00:00
openhands
2293ece915 fix: 'Trigger recenter (account 2 only)' label contradicts public recenter comment (#826)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 19:17:16 +00:00
openhands
13d5b40564 fix: Kraiken.sol and Stake.sol absent from agent context across all runs (#829)
Inject Kraiken.sol (outstandingSupply, mint/burn mechanics) and Stake.sol
(snatch, withdrawal, KRK exclusion from floor denominator) into the red-team
agent prompt so agents can reason from actual source rather than guesses.

- red-team.sh: read SOL_KRAIKEN and SOL_STAKE from onchain/src/ alongside
  the other six contracts already injected
- red-team-program.md: add ### Kraiken.sol and ### Stake.sol sections in the
  Source Code reference block (after PriceOracle.sol)
- AGENTS.md: document the full list of injected contracts in a new
  "Red-team Agent Context" section; both files are now listed as in-scope

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 18:41:57 +00:00
openhands
012b31056e fix: refactor: extract red-team prompt to red-team-program.md (#819)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 17:54:33 +00:00
openhands
4d0390c4fa fix: address review findings for cross-candidate red-team sweep (#822)
- red-team-sweep.sh: reset CROSS_PATTERNS_FILE at sweep start to prevent
  stale patterns from prior invocations contaminating a fresh run
- red-team-sweep.sh: wrap pattern-extraction Python in set +e/set -e and
  capture output so log() prefix is applied; move memory truncation outside
  the if-block so it runs unconditionally even if Python fails
- red-team.sh: filter entries where candidate == current_candidate before
  grouping, removing self-referential cross-candidate evidence
- red-team.sh: skip entries with empty pattern key (both pattern and
  strategy fields empty) to prevent spurious bucket merging

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 17:02:19 +00:00
openhands
9ee1429604 fix: feat: red-team sweep should seed each candidate with cross-candidate attack patterns (#822)
- red-team-sweep.sh: after each candidate completes, extract all memory
  entries into /tmp/red-team-cross-patterns.jsonl (append), then clear
  the raw memory file so the next candidate starts with a fresh state
- red-team.sh: define CROSS_PATTERNS_FILE; before building the prompt,
  read the cross-patterns file and generate a "Cross-Candidate
  Intelligence" section grouped by abstract op pattern — universal
  patterns (broke 2+ candidates), candidate-specific wins, and patterns
  that held everywhere — each annotated with optimizer profiles
- The new section is injected into the Claude prompt above the existing
  Previous Findings block, satisfying all acceptance criteria

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 16:30:54 +00:00
openhands
7950608179 fix: address review findings for red-team memory tracking (#820)
- make_pattern: replace text.find('stake')/find('unstake') with
  re.search(r'\bstake\b')/re.search(r'\bunstake\b') so 'stake' is never
  found as a substring of 'unstake' (bug #1)
- make_pattern: track first-occurrence position of each op and sort by
  position before building the sequence string, preserving actual
  execution order instead of a hardcoded canonical order (bug #2)
- insight capture: track insight_pri on the current dict; only overwrite
  stored insight when new match has strictly higher priority (lower index),
  preventing a late 'because...' clause from silently replacing an earlier
  'Key Insight:' capture (warning #3)
- run_num: compute max(run)+1 from JSON entries instead of wc -l so run
  numbers stay monotonically increasing after memory trim (info #4)
- red-team-sweep.sh: also set adaptive flag when any r37-r40 register has
  a variable-form assignment (r40 = uint256(someVar)), catching candidates
  where only one branch uses constants (warning #5)
- red-team-sweep.sh: remove unnecessary 'import sys as _sys' in except
  block; sys is already in scope (nit #6)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 15:54:01 +00:00
openhands
e7c60edeb6 fix: feat: red-team memory should track candidate + abstract learnings (#820)
- Add CANDIDATE_NAME and OPTIMIZER_PROFILE env vars to red-team.sh
  (defaults to "unknown" for standalone runs)
- Update extract_memory Python: new fields candidate, optimizer_profile,
  pattern (abstract op sequence via make_pattern()), and improved insight
  extraction that also captures WHY explanations (because/since/due to)
- Update MEMORY_SECTION Python: entries now grouped by candidate;
  universal patterns (DECREASED across multiple candidates) surfaced first
- Update prompt: add "Current Attack Target" table with candidate/profile,
  optimizer parameter explanations (CI/AW/AS/DD behavioral impact),
  Rule 9 requiring pattern+insight per strategy, updated report format
  with Pattern/Insight fields and universal-pattern conclusion field
- Update red-team-sweep.sh: after inject, parse OptimizerV3Push3.sol for
  r40/r39/r38/r37 constants to build OPTIMIZER_PROFILE string; pass
  CANDIDATE_NAME and OPTIMIZER_PROFILE as env vars to red-team.sh

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 15:23:43 +00:00
openhands
4779749f2b fix: feat: red-team agent should read LM and optimizer Solidity source (#821)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 14:18:10 +00:00
openhands
7d0473ade7 fix: fix: red-team prompt missing evm_increaseTime for TWAP-enforced recenter (#823)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 10:47:36 +00:00
johba
ff86b3691d chore: extract shared inject.sh, add red-team-sweep.sh (#806)
## What
- `tools/push3-transpiler/inject.sh` — shared transpile+inject logic used by both batch-eval and red-team-sweep
- `batch-eval.sh` — replaced inline 60-line Python block with `inject.sh` call
- `scripts/harb-evaluator/red-team-sweep.sh` — red-teams each kindergarten seed using existing `red-team.sh`, with random smoke test gate

## Why
Sweep script kept breaking because I rewrote the injection logic instead of reusing batch-eval's proven Python. Now there's one copy.

## Testing
- inject.sh tested manually on DO box with optimizer_v3 seed
- Smoke test picks random seed, injects + compiles before starting sweep

Co-authored-by: openhands <openhands@all-hands.dev>
Reviewed-on: https://codeberg.org/johba/harb/pulls/806
Reviewed-by: review_bot <review_bot@noreply.codeberg.org>
2026-03-15 10:24:03 +01:00
openhands
7618309db5 fix: red-team.sh and export-attacks.py use Base Sepolia addresses labeled as mainnet (#794)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 06:48:16 +00:00
openhands
0e33d6cbba fix: DeployLocal.sol feeDest 0xf6a3... may have code on Base Sepolia fork (#760) 2026-03-14 20:58:34 +00:00
openhands
e9397891ed fix: remove setRecenterAccess from red-team.sh — recenter() is now public 2026-03-14 15:10:59 +00:00
openhands
dbf78de793 fix: bootstrap + red-team on forked networks
Bootstrap fixes:
- Idempotency check: skip if Kraiken already deployed on Anvil
- anvil_setCode to strip ERC-4337 code from deployer + feeDest
- DeployLocal.sol: feeDest derived from keccak256('harb.local.feeDest')

Red-team fixes:
- New bootstrap-light.sh: Anvil-only, ~30s deploy
- red-team.sh uses bootstrap-light instead of full docker compose
- anvil_setBalance for feeDest before impersonation
- forge --color never, path resolution, docker chown

Address fixes (all Base mainnet, in both FitnessEvaluator + AttackRunner):
- V3_FACTORY: 0x33128a8fC17869897dcE68Ed026d694621f6FDfD
- SWAP_ROUTER: 0x2626664c2603336E57B271c5C0b26F421741e481
- NPM_ADDR: 0x03a520b32C04BF3bEEf7BEb72E919cf822Ed34f1
2026-03-14 13:31:23 +00:00
johba
6ff8282a7e Merge pull request 'fix: Remove recenterAccess — make recenter() public with TWAP enforcement (#706)' (#713) from fix/issue-706 into master 2026-03-14 10:48:59 +01:00
openhands
52ed8ef233 fix: red-team.sh sudo strips FORK_URL before docker compose sees it (#729)
red-team.sh called bare `sudo docker compose up/down` which applies
env_reset and drops FORK_URL before anvil-entrypoint.sh can read it.
Change both calls to `sudo -E` so the caller's FORK_URL override is
propagated to docker-compose and into the anvil container.

Update ENVIRONMENT.md to reflect that a plain `FORK_URL=... bash
red-team.sh` invocation now works correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 08:30:49 +00:00
openhands
44df166b73 fix: Bare integer interpolation in agent-prompt heredoc at line 494 (#671)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 03:07:55 +00:00
openhands
cbab4c36da fix: NPM_ADDR may be Base Sepolia address in both files (#686)
Replace 0x27F971cb582BF9E50F397e4d29a5C7A34f11faA2 (Base Sepolia
NonfungiblePositionManager) with the correct Base mainnet address
0x03a520B32c04bf3beef7BEb72E919cF822Ed34F3 in all four files that
referenced it, and add an inline comment citing the chain and source.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 02:22:51 +00:00
openhands
1a410a30b7 fix: Remove recenterAccess — make recenter() public with TWAP enforcement (#706) 2026-03-13 22:32:53 +00:00
openhands
a18512a644 fix: Stale JSDoc in navigateToStakePage refers to '/stake' not '/app/stake' (#509) 2026-03-13 10:37:14 +00:00
openhands
659044e2d1 fix: claude subprocess not killed on INT/TERM in cleanup trap (#530)
Track CLAUDE_PID before launching the claude subprocess so cleanup()
can kill it before reverting Anvil state. Running claude via `&` +
`wait` lets the trap fire immediately on INT/TERM, killing the
subprocess and preventing it from making calls against an
already-reverted chain.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13 09:48:34 +00:00
openhands
2ae07e7a49 fix: $FLOOR_BEFORE/$FLOOR_AFTER unquoted inside python3 -c string (#531) 2026-03-13 08:28:26 +00:00
openhands
6924cb03f3 fix: Protocol Mechanics section in agent prompt still exposes ethPerToken formula (#550) 2026-03-13 07:47:35 +00:00
openhands
b902b89e3b fix: address review findings — CREATE2 guard, transition test, docs
- LiquidityManager.setFeeDestination: add CREATE2 bypass guard — also
  blocks re-assignment when the current feeDestination has since acquired
  bytecode (was a plain address when set, contract deployed to it later)
- LiquidityManager.setFeeDestination: expand NatSpec to document the
  EOA-mutability trade-off and the CREATE2 guard explicitly
- Test: add testSetFeeDestinationEOAToContract_Locks covering the
  realistic EOA→contract transition (the primary lock-activation path)
- red-team.sh: add comment that DEPLOYER_PK is Anvil account-0 and must
  only be used against a local ephemeral Anvil instance
- ARCHITECTURE.md: document feeDestination conditional-lock semantics and
  contrast with Kraiken's strictly set-once liquidityManager/stakingPool

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 17:13:50 +00:00
openhands
512640226b fix: fix: Conditional lock on feeDestination — lock when set to contract (#580) (#580)
- Add `feeDestinationLocked` bool to LiquidityManager
- Replace one-shot setter with conditional trapdoor: EOAs may be set
  repeatedly, but setting a contract address locks permanently
- Remove `AddressAlreadySet` error (superseded by the new lock mechanic)
- Replace fragile SLOT7 storage hack in red-team.sh with a proper
  `setFeeDestination()` call using the deployer key
- Update tests: replace AddressAlreadySet test with three new tests
  covering EOA multi-set, contract lock, and locked revert

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 16:13:44 +00:00
johba
514a55a1ac Merge pull request 'fix: Backtesting: replay red-team attack sequences against optimizer candidates (#536)' (#565) from fix/issue-536 into master 2026-03-11 19:24:27 +01:00
openhands
58729b98b4 fix: fix: strip cast formatted annotations from red-team.sh (#577)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-11 10:19:14 +00:00