Commit graph

15 commits

Author SHA1 Message Date
openhands
db76d648de fix: Bare \cd\ at line 293 in main loop (pre-existing) (#927)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 18:47:14 +00:00
johba
415e94243e Merge pull request 'fix: MEMORY_FILE trim may discard DECREASED entries before 4c extraction (#875)' (#926) from fix/issue-875 into master 2026-03-17 19:27:09 +01:00
openhands
8b3fd340ac fix: MEMORY_FILE trim may discard DECREASED entries before 4c extraction (#875)
Address AI reviewer feedback on d1f75a7:

- Wrap cross_file append in try/except so a write failure never prevents
  the memory trim-write from running (bug fix)
- Stamp sweep_id on pre-trim exported entries using the SWEEP_ID env var;
  pass SWEEP_ID from red-team-sweep.sh so entries are attributable to a
  sweep run (data-consistency fix)
- Add inline comment explaining the 3-tuple dedup key (run, ts, strategy)
  and its relationship to step-4c's identity check (clarity nit)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 17:58:41 +00:00
openhands
315b7777f8 fix: Bare \cd\ in smoke test permanently changes shell working directory (#877)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 17:07:17 +00:00
openhands
e2554eb844 fix: \sleep 1\ polling loop violates AGENTS.md 'never use fixed delays' principle (#878)
Replace fixed \`sleep 1\` in the container teardown poll loop with exponential
backoff (100ms → 200ms → … → 2000ms cap). The 30s hard timeout is preserved.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 16:07:58 +00:00
openhands
fe3a3d7d94 fix: feat: persist red-team cross-patterns in repo for continuity across runs (#853)
- Move CROSS_PATTERNS_FILE from /tmp/red-team-cross-patterns.jsonl to
  tools/red-team/cross-patterns.jsonl (repo-tracked path)
- Remove the reset (> file) at sweep start so patterns accumulate across runs
- Generate a SWEEP_ID (sweep-YYYYMMDD-HHMMSS) at sweep start and stamp
  each new entry with sweep_id for traceability
- Deduplicate on (pattern, candidate, result): entries already present in
  the file are skipped; intra-batch duplicates are also suppressed
- Create tools/red-team/ directory with .gitkeep
- Add mkdir -p guards in both scripts so the directory is created on first run

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 12:39:39 +00:00
openhands
8986154d8f fix: sleep 5 at teardown violates AGENTS.md engineering principles (#845) 2026-03-16 07:06:57 +00:00
openhands
ac2fa16e2e fix: ATTACKS_OUT directory not guaranteed to exist (#816) 2026-03-15 22:36:51 +00:00
openhands
ae3eb14833 fix: address review findings for sweep-results.tsv (#818)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 20:48:33 +00:00
openhands
3c6be7d86f fix: feat: structured sweep-results.tsv for red-team sweep (#818)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 20:20:13 +00:00
openhands
4d0390c4fa fix: address review findings for cross-candidate red-team sweep (#822)
- red-team-sweep.sh: reset CROSS_PATTERNS_FILE at sweep start to prevent
  stale patterns from prior invocations contaminating a fresh run
- red-team-sweep.sh: wrap pattern-extraction Python in set +e/set -e and
  capture output so log() prefix is applied; move memory truncation outside
  the if-block so it runs unconditionally even if Python fails
- red-team.sh: filter entries where candidate == current_candidate before
  grouping, removing self-referential cross-candidate evidence
- red-team.sh: skip entries with empty pattern key (both pattern and
  strategy fields empty) to prevent spurious bucket merging

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 17:02:19 +00:00
openhands
9ee1429604 fix: feat: red-team sweep should seed each candidate with cross-candidate attack patterns (#822)
- red-team-sweep.sh: after each candidate completes, extract all memory
  entries into /tmp/red-team-cross-patterns.jsonl (append), then clear
  the raw memory file so the next candidate starts with a fresh state
- red-team.sh: define CROSS_PATTERNS_FILE; before building the prompt,
  read the cross-patterns file and generate a "Cross-Candidate
  Intelligence" section grouped by abstract op pattern — universal
  patterns (broke 2+ candidates), candidate-specific wins, and patterns
  that held everywhere — each annotated with optimizer profiles
- The new section is injected into the Claude prompt above the existing
  Previous Findings block, satisfying all acceptance criteria

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 16:30:54 +00:00
openhands
7950608179 fix: address review findings for red-team memory tracking (#820)
- make_pattern: replace text.find('stake')/find('unstake') with
  re.search(r'\bstake\b')/re.search(r'\bunstake\b') so 'stake' is never
  found as a substring of 'unstake' (bug #1)
- make_pattern: track first-occurrence position of each op and sort by
  position before building the sequence string, preserving actual
  execution order instead of a hardcoded canonical order (bug #2)
- insight capture: track insight_pri on the current dict; only overwrite
  stored insight when new match has strictly higher priority (lower index),
  preventing a late 'because...' clause from silently replacing an earlier
  'Key Insight:' capture (warning #3)
- run_num: compute max(run)+1 from JSON entries instead of wc -l so run
  numbers stay monotonically increasing after memory trim (info #4)
- red-team-sweep.sh: also set adaptive flag when any r37-r40 register has
  a variable-form assignment (r40 = uint256(someVar)), catching candidates
  where only one branch uses constants (warning #5)
- red-team-sweep.sh: remove unnecessary 'import sys as _sys' in except
  block; sys is already in scope (nit #6)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 15:54:01 +00:00
openhands
e7c60edeb6 fix: feat: red-team memory should track candidate + abstract learnings (#820)
- Add CANDIDATE_NAME and OPTIMIZER_PROFILE env vars to red-team.sh
  (defaults to "unknown" for standalone runs)
- Update extract_memory Python: new fields candidate, optimizer_profile,
  pattern (abstract op sequence via make_pattern()), and improved insight
  extraction that also captures WHY explanations (because/since/due to)
- Update MEMORY_SECTION Python: entries now grouped by candidate;
  universal patterns (DECREASED across multiple candidates) surfaced first
- Update prompt: add "Current Attack Target" table with candidate/profile,
  optimizer parameter explanations (CI/AW/AS/DD behavioral impact),
  Rule 9 requiring pattern+insight per strategy, updated report format
  with Pattern/Insight fields and universal-pattern conclusion field
- Update red-team-sweep.sh: after inject, parse OptimizerV3Push3.sol for
  r40/r39/r38/r37 constants to build OPTIMIZER_PROFILE string; pass
  CANDIDATE_NAME and OPTIMIZER_PROFILE as env vars to red-team.sh

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 15:23:43 +00:00
johba
ff86b3691d chore: extract shared inject.sh, add red-team-sweep.sh (#806)
## What
- `tools/push3-transpiler/inject.sh` — shared transpile+inject logic used by both batch-eval and red-team-sweep
- `batch-eval.sh` — replaced inline 60-line Python block with `inject.sh` call
- `scripts/harb-evaluator/red-team-sweep.sh` — red-teams each kindergarten seed using existing `red-team.sh`, with random smoke test gate

## Why
Sweep script kept breaking because I rewrote the injection logic instead of reusing batch-eval's proven Python. Now there's one copy.

## Testing
- inject.sh tested manually on DO box with optimizer_v3 seed
- Smoke test picks random seed, injects + compiles before starting sweep

Co-authored-by: openhands <openhands@all-hands.dev>
Reviewed-on: https://codeberg.org/johba/harb/pulls/806
Reviewed-by: review_bot <review_bot@noreply.codeberg.org>
2026-03-15 10:24:03 +01:00