fix: feat: red-team memory should track candidate + abstract learnings (#820)

- Add CANDIDATE_NAME and OPTIMIZER_PROFILE env vars to red-team.sh
  (defaults to "unknown" for standalone runs)
- Update extract_memory Python: new fields candidate, optimizer_profile,
  pattern (abstract op sequence via make_pattern()), and improved insight
  extraction that also captures WHY explanations (because/since/due to)
- Update MEMORY_SECTION Python: entries now grouped by candidate;
  universal patterns (DECREASED across multiple candidates) surfaced first
- Update prompt: add "Current Attack Target" table with candidate/profile,
  optimizer parameter explanations (CI/AW/AS/DD behavioral impact),
  Rule 9 requiring pattern+insight per strategy, updated report format
  with Pattern/Insight fields and universal-pattern conclusion field
- Update red-team-sweep.sh: after inject, parse OptimizerV3Push3.sol for
  r40/r39/r38/r37 constants to build OPTIMIZER_PROFILE string; pass
  CANDIDATE_NAME and OPTIMIZER_PROFILE as env vars to red-team.sh

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
openhands 2026-03-15 15:23:43 +00:00
parent 7a09c16966
commit e7c60edeb6
2 changed files with 147 additions and 14 deletions

View file

@ -63,12 +63,45 @@ for seed_file in "${seeds[@]}"; do
fi fi
log "Injected into OptimizerV3.sol" log "Injected into OptimizerV3.sol"
# 1b. Extract optimizer profile from transpiler output (CI/AW/AS/DD constants)
TRANSPILER_OUT="$REPO_ROOT/onchain/src/OptimizerV3Push3.sol"
OPTIMIZER_PROFILE=$(python3 - "$TRANSPILER_OUT" <<'PYEOF'
import re, sys
try:
with open(sys.argv[1]) as f:
sol = f.read()
ci_vals = set(re.findall(r'\br40\s*=\s*uint256\((\d+)\)', sol))
aw_vals = set(re.findall(r'\br38\s*=\s*uint256\((\d+)\)', sol))
as_vals = set(re.findall(r'\br39\s*=\s*uint256\((\d+)\)', sol))
dd_vals = set(re.findall(r'\br37\s*=\s*uint256\((\d+)\)', sol))
def fmt_pct(vals):
pcts = sorted(set(round(int(v) * 100 / 1e18) for v in vals))
return '/'.join(str(p) + '%' for p in pcts) if pcts else '?'
def fmt_int(vals):
ints = sorted(set(int(v) for v in vals))
return '/'.join(str(v) for v in ints) if ints else '?'
profile = f"CI={fmt_pct(ci_vals)}, AW={fmt_int(aw_vals)}, AS={fmt_pct(as_vals)}, DD={fmt_pct(dd_vals)}"
if len(ci_vals) > 1 or len(aw_vals) > 1 or len(as_vals) > 1 or len(dd_vals) > 1:
profile += ", adaptive"
print(profile)
except Exception as e:
import sys as _sys
print(f"unknown (parse error: {e})", file=_sys.stderr)
print("unknown")
PYEOF
)
log "Optimizer profile: $OPTIMIZER_PROFILE"
# 2. Clear stale attack file from previous candidate # 2. Clear stale attack file from previous candidate
rm -f "$REPO_ROOT/tmp/red-team-attacks.jsonl" rm -f "$REPO_ROOT/tmp/red-team-attacks.jsonl"
# 3. Run red-team.sh (handles bootstrap + compile + deploy + attack) # 3. Run red-team.sh (handles bootstrap + compile + deploy + attack)
log "Running red-team.sh (timeout: ${TIMEOUT_PER}s)..." log "Running red-team.sh (timeout: ${TIMEOUT_PER}s)..."
CLAUDE_TIMEOUT="$TIMEOUT_PER" timeout "$((TIMEOUT_PER + 120))" \ CLAUDE_TIMEOUT="$TIMEOUT_PER" CANDIDATE_NAME="$seed_name" OPTIMIZER_PROFILE="$OPTIMIZER_PROFILE" \
timeout "$((TIMEOUT_PER + 120))" \
bash "$SCRIPT_DIR/red-team.sh" 2>&1 | tee "/tmp/red-team-${seed_name}.log" || true bash "$SCRIPT_DIR/red-team.sh" 2>&1 | tee "/tmp/red-team-${seed_name}.log" || true
# 4. Collect attacks # 4. Collect attacks

View file

@ -30,6 +30,10 @@ ATTACK_EXPORT="$REPORT_DIR/red-team-attacks.jsonl"
ATTACK_SNAPSHOTS="$REPORT_DIR/red-team-snapshots.jsonl" ATTACK_SNAPSHOTS="$REPORT_DIR/red-team-snapshots.jsonl"
DEPLOYMENTS="$REPO_ROOT/onchain/deployments-local.json" DEPLOYMENTS="$REPO_ROOT/onchain/deployments-local.json"
# ── Candidate metadata (set by red-team-sweep.sh; defaults to unknown for standalone runs) ─
CANDIDATE_NAME="${CANDIDATE_NAME:-unknown}"
OPTIMIZER_PROFILE="${OPTIMIZER_PROFILE:-unknown}"
# ── Anvil accounts ───────────────────────────────────────────────────────────── # ── Anvil accounts ─────────────────────────────────────────────────────────────
# Account 8 — adversary (10k ETH, 0 KRK) # Account 8 — adversary (10k ETH, 0 KRK)
ADV_PK=0xdbda1821b80551c9d65939329250298aa3472ba22feea921c0cf5d620ea67b97 ADV_PK=0xdbda1821b80551c9d65939329250298aa3472ba22feea921c0cf5d620ea67b97
@ -193,7 +197,7 @@ extract_memory() {
run_num=1 run_num=1
fi fi
python3 - "$stream_file" "$memory_file" "$run_num" "$LM_ETH_BEFORE" <<'PYEOF' python3 - "$stream_file" "$memory_file" "$run_num" "$LM_ETH_BEFORE" "$CANDIDATE_NAME" "$OPTIMIZER_PROFILE" <<'PYEOF'
import json, sys, re import json, sys, re
from datetime import datetime, timezone from datetime import datetime, timezone
@ -205,6 +209,35 @@ try:
except (ValueError, IndexError): except (ValueError, IndexError):
print(" extract_memory: invalid lm_eth_before value, skipping", file=sys.stderr) print(" extract_memory: invalid lm_eth_before value, skipping", file=sys.stderr)
sys.exit(0) sys.exit(0)
candidate = sys.argv[5] if len(sys.argv) > 5 else "unknown"
optimizer_profile = sys.argv[6] if len(sys.argv) > 6 else "unknown"
def make_pattern(strategy_name, steps_text):
"""Extract abstract op sequence: buy → stake → recenter → sell."""
text = (strategy_name + " " + steps_text).lower()
ops = []
if "wrap" in text:
ops.append("wrap")
if "buy" in text:
ops.append("buy")
stake_pos = text.find("stake")
unstake_pos = text.find("unstake")
if stake_pos >= 0 and (unstake_pos < 0 or stake_pos < unstake_pos):
ops.append("stake_all" if "all" in text[max(0, stake_pos-10):stake_pos+20] else "stake")
recenters = len(re.findall(r"\brecenter\b", text))
if recenters == 1:
ops.append("recenter")
elif recenters > 1:
ops.append("recenter_multi")
if unstake_pos >= 0:
ops.append("unstake")
if "sell" in text:
ops.append("sell")
if "add_lp" in text or ("mint" in text and ("lp" in text or "liquidity" in text)):
ops.append("add_lp")
if "remove_lp" in text or "decreaseliquidity" in text:
ops.append("remove_lp")
return " → ".join(ops) if ops else strategy_name[:60]
texts = [] texts = []
with open(stream_file) as f: with open(stream_file) as f:
@ -243,11 +276,18 @@ for text in texts:
if floor_matches: if floor_matches:
current["lm_eth_after"] = int(floor_matches[-1].group(1)) current["lm_eth_after"] = int(floor_matches[-1].group(1))
# Capture insights # Capture insights — prefer explicit labels, then WHY explanations
for pattern in [r"[Kk]ey [Ii]nsight:\s*(.+)", r"[Ii]nsight:\s*(.+)", r"(?:discovered|learned|realized)\s+(?:that\s+)?(.+)"]: for ins_pat in [
insight_match = re.search(pattern, text) r"[Kk]ey [Ii]nsight:\s*(.+)",
r"[Ii]nsight:\s*(.+)",
r"[Ww][Hh][Yy][^:]*:\s*(.{30,})",
r"(?:because|since|due to)\s+(.{30,})",
r"(?:discovered|learned|realized)\s+(?:that\s+)?(.+)"
]:
insight_match = re.search(ins_pat, text)
if insight_match and len(insight_match.group(1)) > 20: if insight_match and len(insight_match.group(1)) > 20:
current["insight"] = insight_match.group(1).strip()[:300] current["insight"] = insight_match.group(1).strip()[:300]
break
# Capture step summaries # Capture step summaries
if any(word in text.lower() for word in ["wrap", "buy", "sell", "stake", "recenter", "mint", "approve"]): if any(word in text.lower() for word in ["wrap", "buy", "sell", "stake", "recenter", "mint", "approve"]):
@ -270,10 +310,14 @@ with open(memory_file, "a") as f:
else: else:
result = "HELD" result = "HELD"
pattern = make_pattern(s["strategy"], s["steps"])
entry = { entry = {
"run": run_num, "run": run_num,
"ts": ts, "ts": ts,
"candidate": candidate,
"optimizer_profile": optimizer_profile,
"strategy": s["strategy"][:100], "strategy": s["strategy"][:100],
"pattern": pattern[:150],
"steps": s["steps"][:300].rstrip("; "), "steps": s["steps"][:300].rstrip("; "),
"lm_eth_before": lm_eth_before, "lm_eth_before": lm_eth_before,
"lm_eth_after": fa, "lm_eth_after": fa,
@ -282,7 +326,7 @@ with open(memory_file, "a") as f:
"insight": s["insight"][:300] "insight": s["insight"][:300]
} }
f.write(json.dumps(entry) + "\n") f.write(json.dumps(entry) + "\n")
print(f" Recorded: {entry['strategy']} → {result} ({delta_bps:+d} bps)") print(f" Recorded: {entry['strategy']} [{entry['candidate']}] → {result} ({delta_bps:+d} bps)")
if not strategies: if not strategies:
print(" No strategies detected in stream output") print(" No strategies detected in stream output")
@ -329,6 +373,7 @@ MEMORY_SECTION=""
if [[ -f "$MEMORY_FILE" && -s "$MEMORY_FILE" ]]; then if [[ -f "$MEMORY_FILE" && -s "$MEMORY_FILE" ]]; then
MEMORY_SECTION=$(python3 - "$MEMORY_FILE" <<'PYEOF' MEMORY_SECTION=$(python3 - "$MEMORY_FILE" <<'PYEOF'
import json, sys import json, sys
from collections import defaultdict
entries = [] entries = []
with open(sys.argv[1]) as f: with open(sys.argv[1]) as f:
for line in f: for line in f:
@ -340,17 +385,47 @@ if not entries:
print('## Previous Findings (from earlier runs)') print('## Previous Findings (from earlier runs)')
print() print()
print('DO NOT repeat strategies marked HELD or INCREASED. Build on the insights.') print('DO NOT repeat strategies marked HELD or INCREASED. Build on the insights.')
print('Distinguish optimizer-specific vulnerabilities from universal patterns.')
print('Try NEW combinations not yet attempted. Combine tools creatively.') print('Try NEW combinations not yet attempted. Combine tools creatively.')
print() print()
for e in entries:
r = e.get('result', '?') # Cross-candidate: patterns that DECREASED in multiple distinct candidates
emoji = '❌' if r == 'DECREASED' else '⬆️' if r == 'INCREASED' else '➡️' decreased = [e for e in entries if e.get('result') == 'DECREASED']
print(f"### Run {e.get('run','?')}: {e.get('strategy','?')} {emoji} {r}") cross = defaultdict(set)
print(f"Steps: {e.get('steps','?')}") for e in decreased:
print(f"Delta: {e.get('delta_bps',0)} bps") key = e.get('pattern') or e.get('strategy', '')
if e.get('insight'): cross[key].add(e.get('candidate', 'unknown'))
print(f"**Insight:** {e['insight']}") universal = [(p, cands) for p, cands in cross.items() if len(cands) > 1]
if universal:
print('### Universal Patterns (succeeded across multiple candidates)')
for pat, cands in universal:
print(f"- **{pat}** — worked on: {', '.join(sorted(cands))}")
print() print()
# Group remaining entries by candidate
by_candidate = defaultdict(list)
for e in entries:
by_candidate[e.get('candidate', 'unknown')].append(e)
for cand, cand_entries in sorted(by_candidate.items()):
prof = next((e.get('optimizer_profile', '') for e in cand_entries
if e.get('optimizer_profile', '') not in ('', 'unknown')), '')
print(f"### Candidate: {cand}")
if prof:
print(f"Profile: {prof}")
print()
for e in cand_entries:
r = e.get('result', '?')
emoji = '❌' if r == 'DECREASED' else '⬆️' if r == 'INCREASED' else '➡️'
pat = e.get('pattern', '')
print(f"#### Run {e.get('run','?')}: {e.get('strategy','?')} {emoji} {r}")
if pat:
print(f"Pattern: `{pat}`")
print(f"Steps: {e.get('steps','?')}")
print(f"Delta: {e.get('delta_bps',0)} bps")
if e.get('insight'):
print(f"**Insight:** {e['insight']}")
print()
PYEOF PYEOF
) )
fi fi
@ -371,6 +446,21 @@ The metric is simple: if LM total ETH goes down, you win.
--- ---
## Current Attack Target
| Field | Value |
|-------|-------|
| Candidate | ${CANDIDATE_NAME} |
| Optimizer Profile | ${OPTIMIZER_PROFILE} |
Use the optimizer profile to reason about this candidate's behavior:
- **CI** (concentration index %): higher → optimizer recenters more aggressively → more KRK minting opportunities
- **AW** (anchorWidth ticks): wider → liquidity spread over larger price range → less ETH per tick
- **AS** (anchorShare %): higher → more ETH locked in anchor position → different rebalancing behavior
- **DD** (discoveryDepth %): higher → more ETH in discovery position (above-price) → price-sensitive exposure
---
## Contract addresses (local Anvil) ## Contract addresses (local Anvil)
| Contract | Address | | Contract | Address |
@ -649,6 +739,11 @@ SNAP=\$(/home/debian/.foundry/bin/cast rpc anvil_snapshot --rpc-url http://local
6. If Previous Findings are provided, DO NOT repeat those strategies. Use their insights to design new approaches. 6. If Previous Findings are provided, DO NOT repeat those strategies. Use their insights to design new approaches.
7. Prioritize untried COMBINATIONS: staking + LP, staking + recenter timing, LP + multi-step swaps, etc. 7. Prioritize untried COMBINATIONS: staking + LP, staking + recenter timing, LP + multi-step swaps, etc.
8. Start executing immediately. No lengthy planning — act, measure, iterate. 8. Start executing immediately. No lengthy planning — act, measure, iterate.
9. For EVERY strategy attempted, record:
- **Pattern**: abstract op sequence (e.g., "buy → stake_all → recenter_multi → unstake → sell")
- **Insight**: WHY this worked or failed, referencing the optimizer profile (${OPTIMIZER_PROFILE}).
For HELD/INCREASED: which mechanism defended the floor? How did CI/AW/AS/DD cause it?
For DECREASED: which parameter combination created the vulnerability? Is it universal or optimizer-specific?
--- ---
@ -661,12 +756,16 @@ After trying all strategies, output a clearly structured report:
\`\`\` \`\`\`
=== RED-TEAM REPORT === === RED-TEAM REPORT ===
Candidate: ${CANDIDATE_NAME}
Optimizer Profile: ${OPTIMIZER_PROFILE}
lm_eth_before: <value> wei (total: free + positions) lm_eth_before: <value> wei (total: free + positions)
STRATEGY 1: <name> STRATEGY 1: <name>
Pattern: <abstract op sequence e.g. "buy → recenter → sell">
Steps: <what you did> Steps: <what you did>
lm_eth_after: <value> wei lm_eth_after: <value> wei
Result: ETH_EXTRACTED / ETH_SAFE / ETH_GAINED Result: ETH_EXTRACTED / ETH_SAFE / ETH_GAINED
Insight: <WHY this worked/failed given the optimizer profile>
STRATEGY 2: ... STRATEGY 2: ...
... ...
@ -674,6 +773,7 @@ STRATEGY 2: ...
=== CONCLUSION === === CONCLUSION ===
ETH extracted: YES / NO ETH extracted: YES / NO
Winning strategy: <describe if YES, else "None"> Winning strategy: <describe if YES, else "None">
Universal pattern: <would this likely work on other candidates? Why or why not?>
lm_eth_before: ${LM_ETH_BEFORE} wei lm_eth_before: ${LM_ETH_BEFORE} wei
lm_eth_after: <final value> wei lm_eth_after: <final value> wei
\`\`\` \`\`\`