Commit graph

1192 commits

Author SHA1 Message Date
openhands
d34fe698ab ci: retrigger after infra failure 2026-03-16 08:09:12 +00:00
openhands
cb305b8c81 fix: MEMORY_FILE parent directory ($REPO_ROOT/tmp/) also not guaranteed to exist (#844)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 07:57:20 +00:00
johba
39793ec0df Merge pull request 'fix: sleep 5 at teardown violates AGENTS.md engineering principles (#845)' (#861) from fix/issue-845 into master 2026-03-16 08:49:06 +01:00
openhands
6e66bfd2f6 ci: retrigger after infra failure 2026-03-16 07:16:46 +00:00
openhands
8986154d8f fix: sleep 5 at teardown violates AGENTS.md engineering principles (#845) 2026-03-16 07:06:57 +00:00
johba
10ff61e6b5 Merge pull request 'fix: package.json missing 'type': 'module' inconsistent with AGENTS.md (#850)' (#855) from fix/issue-850 into master 2026-03-16 07:56:59 +01:00
openhands
3f24faba18 fix: package.json missing 'type': 'module' inconsistent with AGENTS.md (#850)
Update tsconfig.json to use NodeNext module system (fixes CJS/ESM conflict),
enable ts-node ESM mode, and add .js extensions to relative imports so the
built output and ts-node dev script both work correctly with "type":"module".
2026-03-16 06:35:05 +00:00
openhands
0c43054f42 fix: package.json missing 'type': 'module' inconsistent with AGENTS.md (#850) 2026-03-16 06:07:10 +00:00
johba
81501758ad Merge pull request 'fix: AttackRunner.s.sol NPM_ADDR last byte is 0xF1 but scripts use 0xF3 (#807)' (#851) from fix/issue-807 into master 2026-03-16 02:13:28 +01:00
openhands
fd912a2a69 fix: AttackRunner.s.sol NPM_ADDR last byte is 0xF1 but scripts use 0xF3 (#807)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 00:50:28 +00:00
johba
00349d9f45 Merge pull request 'fix: Body extraction stops at first shallow closing brace (#809)' (#849) from fix/issue-809 into master 2026-03-16 01:36:08 +01:00
openhands
34b016a190 fix: Body extraction stops at first shallow closing brace (#809)
Replace the }` heuristic in inject.sh with a brace-depth counter:
start at depth=1 after the opening {, increment on {, decrement on },
stop when depth reaches 0. This correctly handles nested if/else blocks,
loops, and structs that close at 4-space indent inside calculateParams.

Also emit a non-zero exit with a descriptive message if EOF is reached
without finding the matching closing brace.

Add test_inject_extraction.sh covering simple bodies, nested if/else,
multi-level nesting, and the EOF-without-match error case.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 00:21:06 +00:00
johba
3e5cbea7f7 Merge pull request 'fix: mutate.test.ts: pre-existing isValid > stack underflow failure (#810)' (#848) from fix/issue-810 into master 2026-03-16 01:09:20 +01:00
openhands
6a55c37b20 fix: mutate.test.ts: pre-existing \isValid > stack underflow\ failure (#810)
dpop/bpop silently returned '0'/'false' on stack underflow instead of
throwing, so isValid() never returned false for underflowing programs.
Make dpop and bpop throw an Error on underflow so the transpiler's
existing try/catch in isValid() correctly classifies such programs as
invalid. The output-extraction phase uses state.dStack.pop() directly
(not dpop) and is unaffected.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 23:49:55 +00:00
johba
3584c03261 Merge pull request 'fix: Fitness re-evaluation for fixed evo_run007_champion (#811)' (#846) from fix/issue-811 into master 2026-03-16 00:38:15 +01:00
openhands
79bcb81b81 fix: Fitness re-evaluation for fixed evo_run007_champion (#811)
Null out the stale fitness score (7116531284966772550194) for
evo_run007_champion.push3, which was recorded against the buggy
processExecIf interpreter (pre-#655 fix). Setting fitness to null
marks the entry for re-scoring by evaluate-seeds.sh once a valid
ANVIL_FORK_URL is available. Updated the note field to document why
the fitness was cleared.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 23:21:04 +00:00
johba
b4f549bf05 Merge pull request 'fix: ATTACKS_OUT directory not guaranteed to exist (#816)' (#843) from fix/issue-816 into master 2026-03-16 00:08:17 +01:00
openhands
ac2fa16e2e fix: ATTACKS_OUT directory not guaranteed to exist (#816) 2026-03-15 22:36:51 +00:00
johba
938c6d284e Merge pull request 'fix: Unclamped anchorWidth can overflow tick range — no upper-bound guard after MAX_ANCHOR_WIDTH removal (#783) (#817)' (#841) from fix/issue-817 into master 2026-03-15 23:26:50 +01:00
openhands
aa274fd8ed fix: address review findings for anchorWidth guard (#817)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 22:04:13 +00:00
openhands
a21cf398bf fix: Unclamped anchorWidth can overflow tick range — no upper-bound guard after MAX_ANCHOR_WIDTH removal (#783) (#817)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 21:34:33 +00:00
johba
cd4926b540 Merge pull request 'fix: feat: structured sweep-results.tsv for red-team sweep (#818)' (#840) from fix/issue-818 into master 2026-03-15 22:16:34 +01:00
openhands
ae3eb14833 fix: address review findings for sweep-results.tsv (#818)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 20:48:33 +00:00
openhands
3c6be7d86f fix: feat: structured sweep-results.tsv for red-team sweep (#818)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 20:20:13 +00:00
johba
2fc0ce2b60 Merge pull request 'fix: Hardcoded TWAP/cooldown values not documented (#825)' (#839) from fix/issue-825 into master 2026-03-15 21:14:57 +01:00
openhands
0d09f598d9 fix: Hardcoded TWAP/cooldown values not documented (#825)
Document MIN_RECENTER_INTERVAL (60 s, LiquidityManager.sol:61) and
PRICE_STABILITY_INTERVAL (300 s, PriceOracle.sol:14) in
docs/ARCHITECTURE.md and docs/PRODUCT-TRUTH.md so that agent-facing
and product-facing copy stays traceable to source constants.

Add an inline HTML comment in red-team-program.md next to the
hardcoded 60s/300s sentence pointing to the two source constants,
making drift detectable during code review.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 19:51:52 +00:00
johba
baaca1c9b4 Merge pull request 'fix: 'Trigger recenter (account 2 only)' label contradicts public recenter comment (#826)' (#836) from fix/issue-826 into master 2026-03-15 20:45:14 +01:00
openhands
2293ece915 fix: 'Trigger recenter (account 2 only)' label contradicts public recenter comment (#826)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 19:17:16 +00:00
johba
aa3bc020d9 Merge pull request 'fix: Kraiken.sol and Stake.sol absent from agent context across all runs (#829)' (#834) from fix/issue-829 into master 2026-03-15 20:10:07 +01:00
openhands
13d5b40564 fix: Kraiken.sol and Stake.sol absent from agent context across all runs (#829)
Inject Kraiken.sol (outstandingSupply, mint/burn mechanics) and Stake.sol
(snatch, withdrawal, KRK exclusion from floor denominator) into the red-team
agent prompt so agents can reason from actual source rather than guesses.

- red-team.sh: read SOL_KRAIKEN and SOL_STAKE from onchain/src/ alongside
  the other six contracts already injected
- red-team-program.md: add ### Kraiken.sol and ### Stake.sol sections in the
  Source Code reference block (after PriceOracle.sol)
- AGENTS.md: document the full list of injected contracts in a new
  "Red-team Agent Context" section; both files are now listed as in-scope

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 18:41:57 +00:00
johba
682d55f00a Merge pull request 'fix: refactor: extract red-team prompt to red-team-program.md (#819)' (#833) from fix/issue-819 into master 2026-03-15 19:28:40 +01:00
openhands
012b31056e fix: refactor: extract red-team prompt to red-team-program.md (#819)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 17:54:33 +00:00
johba
55cfbeb291 Merge pull request 'fix: feat: red-team sweep should seed each candidate with cross-candidate attack patterns (#822)' (#832) from fix/issue-822 into master 2026-03-15 18:36:26 +01:00
johba
0122546f54 Merge pull request 'chore: add planner watermarks to all AGENTS.md files' (#831) from chore/agents-watermarks into master
Reviewed-on: https://codeberg.org/johba/harb/pulls/831
Reviewed-by: review_bot <review_bot@noreply.codeberg.org>
2026-03-15 18:25:43 +01:00
openhands
4d0390c4fa fix: address review findings for cross-candidate red-team sweep (#822)
- red-team-sweep.sh: reset CROSS_PATTERNS_FILE at sweep start to prevent
  stale patterns from prior invocations contaminating a fresh run
- red-team-sweep.sh: wrap pattern-extraction Python in set +e/set -e and
  capture output so log() prefix is applied; move memory truncation outside
  the if-block so it runs unconditionally even if Python fails
- red-team.sh: filter entries where candidate == current_candidate before
  grouping, removing self-referential cross-candidate evidence
- red-team.sh: skip entries with empty pattern key (both pattern and
  strategy fields empty) to prevent spurious bucket merging

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 17:02:19 +00:00
openhands
9a309634ed chore: add planner watermarks to all AGENTS.md files 2026-03-15 16:42:45 +00:00
openhands
9ee1429604 fix: feat: red-team sweep should seed each candidate with cross-candidate attack patterns (#822)
- red-team-sweep.sh: after each candidate completes, extract all memory
  entries into /tmp/red-team-cross-patterns.jsonl (append), then clear
  the raw memory file so the next candidate starts with a fresh state
- red-team.sh: define CROSS_PATTERNS_FILE; before building the prompt,
  read the cross-patterns file and generate a "Cross-Candidate
  Intelligence" section grouped by abstract op pattern — universal
  patterns (broke 2+ candidates), candidate-specific wins, and patterns
  that held everywhere — each annotated with optimizer profiles
- The new section is injected into the Claude prompt above the existing
  Previous Findings block, satisfying all acceptance criteria

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 16:30:54 +00:00
johba
bf1a735481 Merge pull request 'fix: feat: red-team memory should track candidate + abstract learnings (#820)' (#830) from fix/issue-820 into master 2026-03-15 17:17:33 +01:00
openhands
7950608179 fix: address review findings for red-team memory tracking (#820)
- make_pattern: replace text.find('stake')/find('unstake') with
  re.search(r'\bstake\b')/re.search(r'\bunstake\b') so 'stake' is never
  found as a substring of 'unstake' (bug #1)
- make_pattern: track first-occurrence position of each op and sort by
  position before building the sequence string, preserving actual
  execution order instead of a hardcoded canonical order (bug #2)
- insight capture: track insight_pri on the current dict; only overwrite
  stored insight when new match has strictly higher priority (lower index),
  preventing a late 'because...' clause from silently replacing an earlier
  'Key Insight:' capture (warning #3)
- run_num: compute max(run)+1 from JSON entries instead of wc -l so run
  numbers stay monotonically increasing after memory trim (info #4)
- red-team-sweep.sh: also set adaptive flag when any r37-r40 register has
  a variable-form assignment (r40 = uint256(someVar)), catching candidates
  where only one branch uses constants (warning #5)
- red-team-sweep.sh: remove unnecessary 'import sys as _sys' in except
  block; sys is already in scope (nit #6)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 15:54:01 +00:00
openhands
e7c60edeb6 fix: feat: red-team memory should track candidate + abstract learnings (#820)
- Add CANDIDATE_NAME and OPTIMIZER_PROFILE env vars to red-team.sh
  (defaults to "unknown" for standalone runs)
- Update extract_memory Python: new fields candidate, optimizer_profile,
  pattern (abstract op sequence via make_pattern()), and improved insight
  extraction that also captures WHY explanations (because/since/due to)
- Update MEMORY_SECTION Python: entries now grouped by candidate;
  universal patterns (DECREASED across multiple candidates) surfaced first
- Update prompt: add "Current Attack Target" table with candidate/profile,
  optimizer parameter explanations (CI/AW/AS/DD behavioral impact),
  Rule 9 requiring pattern+insight per strategy, updated report format
  with Pattern/Insight fields and universal-pattern conclusion field
- Update red-team-sweep.sh: after inject, parse OptimizerV3Push3.sol for
  r40/r39/r38/r37 constants to build OPTIMIZER_PROFILE string; pass
  CANDIDATE_NAME and OPTIMIZER_PROFILE as env vars to red-team.sh

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 15:23:43 +00:00
johba
7a09c16966 Merge pull request 'fix: txnBot AGENTS.md ENVIRONMENT enum is stale (#784)' (#815) from fix/issue-784 into master 2026-03-15 16:06:03 +01:00
johba
963c0d316a Merge pull request 'fix: feat: red-team agent should read LM and optimizer Solidity source (#821)' (#828) from fix/issue-821 into master 2026-03-15 15:57:21 +01:00
openhands
4779749f2b fix: feat: red-team agent should read LM and optimizer Solidity source (#821)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 14:18:10 +00:00
openhands
afae00ed9f fix: txnBot AGENTS.md ENVIRONMENT enum is stale (#784) 2026-03-15 14:11:58 +00:00
openhands
0f3399a73c fix: txnBot AGENTS.md ENVIRONMENT enum is stale (#784) 2026-03-15 14:11:45 +00:00
johba
504977941e Merge pull request 'fix: fix: red-team prompt missing evm_increaseTime for TWAP-enforced recenter (#823)' (#824) from fix/issue-823 into master 2026-03-15 12:36:49 +01:00
openhands
ff53625c9c fix: fix: red-team prompt missing evm_increaseTime for TWAP-enforced recenter (#823) 2026-03-15 10:47:47 +00:00
openhands
7d0473ade7 fix: fix: red-team prompt missing evm_increaseTime for TWAP-enforced recenter (#823)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 10:47:36 +00:00
johba
d6e5990802 Merge pull request 'fix: ThreePositionStrategy class comment still advertises 1-100% anchor width (#786)' (#813) from fix/issue-786 into master 2026-03-15 10:36:02 +01:00
johba
ff86b3691d chore: extract shared inject.sh, add red-team-sweep.sh (#806)
## What
- `tools/push3-transpiler/inject.sh` — shared transpile+inject logic used by both batch-eval and red-team-sweep
- `batch-eval.sh` — replaced inline 60-line Python block with `inject.sh` call
- `scripts/harb-evaluator/red-team-sweep.sh` — red-teams each kindergarten seed using existing `red-team.sh`, with random smoke test gate

## Why
Sweep script kept breaking because I rewrote the injection logic instead of reusing batch-eval's proven Python. Now there's one copy.

## Testing
- inject.sh tested manually on DO box with optimizer_v3 seed
- Smoke test picks random seed, injects + compiles before starting sweep

Co-authored-by: openhands <openhands@all-hands.dev>
Reviewed-on: https://codeberg.org/johba/harb/pulls/806
Reviewed-by: review_bot <review_bot@noreply.codeberg.org>
2026-03-15 10:24:03 +01:00