fix: feat: revm-based fitness evaluator for evolution at scale (#604)
Replace per-candidate Anvil+forge-script pipeline with in-process EVM
execution using Foundry's native revm backend, achieving 10-100× speedup
for evolutionary search at scale.
New files:
- onchain/test/FitnessEvaluator.t.sol — Forge test that forks Base once,
deploys the full KRAIKEN stack, then for each candidate uses vm.etch to
inject the compiled optimizer bytecode, UUPS-upgrades the proxy, runs all
attack sequences with in-memory vm.snapshot/revertTo (no RPC overhead),
and emits one {"candidate_id","fitness"} JSON line per candidate.
Skips gracefully when BASE_RPC_URL is unset (CI-safe).
- tools/push3-evolution/revm-evaluator/batch-eval.sh — Wrapper that
transpiles+compiles each candidate sequentially, writes a two-file
manifest (ids.txt + bytecodes.txt), then invokes FitnessEvaluator.t.sol
in a single forge test run and parses the score JSON from stdout.
Modified:
- tools/push3-evolution/evolve.sh — Adds EVAL_MODE env var (anvil|revm).
When EVAL_MODE=revm, batch-scores every candidate in a generation with
one batch-eval.sh call instead of N sequential fitness.sh processes;
scores are looked up from the JSONL output in the per-candidate loop.
Default remains EVAL_MODE=anvil for backward compatibility.
Key design decisions:
- Per-candidate Solidity compilation is unavoidable (each Push3 candidate
produces different Solidity); the speedup is in the evaluation phase.
- vm.snapshot/revertTo in forge test are O(1) memory operations (true
revm), not RPC calls — this is the core speedup vs Anvil.
- recenterAccess is set in bootstrap so TWAP stability checks are bypassed
during attack sequences (mirrors the existing fitness.sh bootstrap).
- Test skips cleanly when BASE_RPC_URL is absent, keeping CI green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 11:54:41 +00:00
|
|
|
#!/usr/bin/env bash
|
|
|
|
|
# =============================================================================
|
|
|
|
|
# batch-eval.sh — revm-based batch fitness evaluator
|
|
|
|
|
#
|
|
|
|
|
# Replaces the per-candidate Anvil+forge-script pipeline with in-process EVM
|
|
|
|
|
# execution via Foundry's native revm backend (FitnessEvaluator.t.sol).
|
|
|
|
|
#
|
|
|
|
|
# Speedup: compiles each candidate once (unavoidable — different Solidity per
|
|
|
|
|
# candidate), then runs ALL attack sequences in a single in-process forge test
|
|
|
|
|
# with O(1) memory snapshot/revert instead of RPC calls per attack.
|
|
|
|
|
#
|
|
|
|
|
# Usage:
|
|
|
|
|
# ./tools/push3-evolution/revm-evaluator/batch-eval.sh \
|
|
|
|
|
# [--output-dir /tmp/scores] \
|
|
|
|
|
# candidate0.push3 candidate1.push3 ...
|
|
|
|
|
#
|
|
|
|
|
# Output (stdout):
|
|
|
|
|
# One JSON object per candidate:
|
2026-03-14 03:36:43 +00:00
|
|
|
# {"candidate_id":"candidate_000","fitness":123456789,"gas_used":15432}
|
2026-03-13 01:05:37 +00:00
|
|
|
# Over-gas-limit candidates emit fitness:0 with "error":"gas_over_limit".
|
|
|
|
|
# Downstream parsers use the "fitness" key; extra fields are ignored.
|
fix: feat: revm-based fitness evaluator for evolution at scale (#604)
Replace per-candidate Anvil+forge-script pipeline with in-process EVM
execution using Foundry's native revm backend, achieving 10-100× speedup
for evolutionary search at scale.
New files:
- onchain/test/FitnessEvaluator.t.sol — Forge test that forks Base once,
deploys the full KRAIKEN stack, then for each candidate uses vm.etch to
inject the compiled optimizer bytecode, UUPS-upgrades the proxy, runs all
attack sequences with in-memory vm.snapshot/revertTo (no RPC overhead),
and emits one {"candidate_id","fitness"} JSON line per candidate.
Skips gracefully when BASE_RPC_URL is unset (CI-safe).
- tools/push3-evolution/revm-evaluator/batch-eval.sh — Wrapper that
transpiles+compiles each candidate sequentially, writes a two-file
manifest (ids.txt + bytecodes.txt), then invokes FitnessEvaluator.t.sol
in a single forge test run and parses the score JSON from stdout.
Modified:
- tools/push3-evolution/evolve.sh — Adds EVAL_MODE env var (anvil|revm).
When EVAL_MODE=revm, batch-scores every candidate in a generation with
one batch-eval.sh call instead of N sequential fitness.sh processes;
scores are looked up from the JSONL output in the per-candidate loop.
Default remains EVAL_MODE=anvil for backward compatibility.
Key design decisions:
- Per-candidate Solidity compilation is unavoidable (each Push3 candidate
produces different Solidity); the speedup is in the evaluation phase.
- vm.snapshot/revertTo in forge test are O(1) memory operations (true
revm), not RPC calls — this is the core speedup vs Anvil.
- recenterAccess is set in bootstrap so TWAP stability checks are bypassed
during attack sequences (mirrors the existing fitness.sh bootstrap).
- Test skips cleanly when BASE_RPC_URL is absent, keeping CI green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 11:54:41 +00:00
|
|
|
#
|
|
|
|
|
# Exit codes:
|
|
|
|
|
# 0 Success.
|
|
|
|
|
# 1 Candidate-level error (transpile/compile failed for at least one candidate).
|
|
|
|
|
# 2 Infrastructure error (missing tool, BASE_RPC_URL not set, forge test failed).
|
|
|
|
|
#
|
|
|
|
|
# Environment:
|
|
|
|
|
# BASE_RPC_URL Required. Base network RPC endpoint for forking.
|
|
|
|
|
# ATTACKS_DIR Optional. Path to *.jsonl attack files.
|
|
|
|
|
# (default: <repo>/onchain/script/backtesting/attacks)
|
|
|
|
|
# OUTPUT_DIR Optional. Directory to copy scores.jsonl into (--output-dir overrides).
|
|
|
|
|
# =============================================================================
|
|
|
|
|
|
|
|
|
|
set -euo pipefail
|
|
|
|
|
|
|
|
|
|
export PATH="${HOME}/.foundry/bin:${PATH}"
|
|
|
|
|
|
|
|
|
|
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
|
|
|
|
REPO_ROOT="$(cd "$SCRIPT_DIR/../../.." && pwd)"
|
|
|
|
|
ONCHAIN_DIR="$REPO_ROOT/onchain"
|
|
|
|
|
TRANSPILER_DIR="$REPO_ROOT/tools/push3-transpiler"
|
|
|
|
|
TRANSPILER_OUT="$ONCHAIN_DIR/src/OptimizerV3Push3.sol"
|
2026-03-12 19:54:58 +00:00
|
|
|
# Use OptimizerV3 (inherits Optimizer → UUPS compatible, has getLiquidityParams)
|
|
|
|
|
# instead of standalone OptimizerV3Push3 which lacks UUPS hooks.
|
|
|
|
|
OPTIMIZERV3_SOL="$ONCHAIN_DIR/src/OptimizerV3.sol"
|
|
|
|
|
ARTIFACT_PATH="$ONCHAIN_DIR/out/OptimizerV3.sol/OptimizerV3.json"
|
fix: feat: revm-based fitness evaluator for evolution at scale (#604)
Replace per-candidate Anvil+forge-script pipeline with in-process EVM
execution using Foundry's native revm backend, achieving 10-100× speedup
for evolutionary search at scale.
New files:
- onchain/test/FitnessEvaluator.t.sol — Forge test that forks Base once,
deploys the full KRAIKEN stack, then for each candidate uses vm.etch to
inject the compiled optimizer bytecode, UUPS-upgrades the proxy, runs all
attack sequences with in-memory vm.snapshot/revertTo (no RPC overhead),
and emits one {"candidate_id","fitness"} JSON line per candidate.
Skips gracefully when BASE_RPC_URL is unset (CI-safe).
- tools/push3-evolution/revm-evaluator/batch-eval.sh — Wrapper that
transpiles+compiles each candidate sequentially, writes a two-file
manifest (ids.txt + bytecodes.txt), then invokes FitnessEvaluator.t.sol
in a single forge test run and parses the score JSON from stdout.
Modified:
- tools/push3-evolution/evolve.sh — Adds EVAL_MODE env var (anvil|revm).
When EVAL_MODE=revm, batch-scores every candidate in a generation with
one batch-eval.sh call instead of N sequential fitness.sh processes;
scores are looked up from the JSONL output in the per-candidate loop.
Default remains EVAL_MODE=anvil for backward compatibility.
Key design decisions:
- Per-candidate Solidity compilation is unavoidable (each Push3 candidate
produces different Solidity); the speedup is in the evaluation phase.
- vm.snapshot/revertTo in forge test are O(1) memory operations (true
revm), not RPC calls — this is the core speedup vs Anvil.
- recenterAccess is set in bootstrap so TWAP stability checks are bypassed
during attack sequences (mirrors the existing fitness.sh bootstrap).
- Test skips cleanly when BASE_RPC_URL is absent, keeping CI green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 11:54:41 +00:00
|
|
|
DEFAULT_ATTACKS_DIR="$ONCHAIN_DIR/script/backtesting/attacks"
|
|
|
|
|
|
|
|
|
|
# =============================================================================
|
|
|
|
|
# Argument parsing
|
|
|
|
|
# =============================================================================
|
|
|
|
|
|
|
|
|
|
OUTPUT_DIR="${OUTPUT_DIR:-}"
|
|
|
|
|
|
|
|
|
|
declare -a PUSH3_FILES=()
|
|
|
|
|
|
|
|
|
|
while [[ $# -gt 0 ]]; do
|
|
|
|
|
case $1 in
|
|
|
|
|
--output-dir) OUTPUT_DIR="$2"; shift 2 ;;
|
|
|
|
|
--*) echo "Unknown option: $1" >&2; exit 2 ;;
|
|
|
|
|
*) PUSH3_FILES+=("$1"); shift ;;
|
|
|
|
|
esac
|
|
|
|
|
done
|
|
|
|
|
|
|
|
|
|
if [ "${#PUSH3_FILES[@]}" -eq 0 ]; then
|
|
|
|
|
echo "Usage: $0 [--output-dir DIR] candidate1.push3 ..." >&2
|
|
|
|
|
exit 2
|
|
|
|
|
fi
|
|
|
|
|
|
|
|
|
|
# =============================================================================
|
|
|
|
|
# Environment checks
|
|
|
|
|
# =============================================================================
|
|
|
|
|
|
|
|
|
|
BASE_RPC_URL="${BASE_RPC_URL:-}"
|
|
|
|
|
if [ -z "$BASE_RPC_URL" ]; then
|
|
|
|
|
echo " [batch-eval] ERROR: BASE_RPC_URL env var required for Base network fork" >&2
|
|
|
|
|
exit 2
|
|
|
|
|
fi
|
|
|
|
|
|
|
|
|
|
for _tool in forge node python3; do
|
|
|
|
|
command -v "$_tool" &>/dev/null || { echo " [batch-eval] ERROR: $_tool not found in PATH" >&2; exit 2; }
|
|
|
|
|
done
|
|
|
|
|
|
|
|
|
|
# =============================================================================
|
|
|
|
|
# Helpers
|
|
|
|
|
# =============================================================================
|
|
|
|
|
|
|
|
|
|
log() { echo " [batch-eval] $*" >&2; }
|
|
|
|
|
fail2() { echo " [batch-eval] ERROR: $*" >&2; exit 2; }
|
|
|
|
|
|
|
|
|
|
# =============================================================================
|
|
|
|
|
# Step 1 — Ensure transpiler dependencies are installed
|
|
|
|
|
# =============================================================================
|
|
|
|
|
|
|
|
|
|
if [ ! -d "$TRANSPILER_DIR/node_modules" ]; then
|
|
|
|
|
log "Installing transpiler dependencies…"
|
|
|
|
|
(cd "$TRANSPILER_DIR" && npm install --silent) || fail2 "npm install in push3-transpiler failed"
|
|
|
|
|
fi
|
|
|
|
|
|
|
|
|
|
# =============================================================================
|
|
|
|
|
# Step 2 — Transpile + compile each candidate, extract bytecodes into manifest
|
|
|
|
|
# =============================================================================
|
|
|
|
|
|
|
|
|
|
MANIFEST_DIR="$(mktemp -d)"
|
2026-03-14 19:46:50 +00:00
|
|
|
|
|
|
|
|
cleanup() {
|
|
|
|
|
[ -d "${MANIFEST_DIR:-}" ] && rm -rf "$MANIFEST_DIR"
|
|
|
|
|
}
|
|
|
|
|
trap cleanup EXIT
|
|
|
|
|
|
fix: feat: revm-based fitness evaluator for evolution at scale (#604)
Replace per-candidate Anvil+forge-script pipeline with in-process EVM
execution using Foundry's native revm backend, achieving 10-100× speedup
for evolutionary search at scale.
New files:
- onchain/test/FitnessEvaluator.t.sol — Forge test that forks Base once,
deploys the full KRAIKEN stack, then for each candidate uses vm.etch to
inject the compiled optimizer bytecode, UUPS-upgrades the proxy, runs all
attack sequences with in-memory vm.snapshot/revertTo (no RPC overhead),
and emits one {"candidate_id","fitness"} JSON line per candidate.
Skips gracefully when BASE_RPC_URL is unset (CI-safe).
- tools/push3-evolution/revm-evaluator/batch-eval.sh — Wrapper that
transpiles+compiles each candidate sequentially, writes a two-file
manifest (ids.txt + bytecodes.txt), then invokes FitnessEvaluator.t.sol
in a single forge test run and parses the score JSON from stdout.
Modified:
- tools/push3-evolution/evolve.sh — Adds EVAL_MODE env var (anvil|revm).
When EVAL_MODE=revm, batch-scores every candidate in a generation with
one batch-eval.sh call instead of N sequential fitness.sh processes;
scores are looked up from the JSONL output in the per-candidate loop.
Default remains EVAL_MODE=anvil for backward compatibility.
Key design decisions:
- Per-candidate Solidity compilation is unavoidable (each Push3 candidate
produces different Solidity); the speedup is in the evaluation phase.
- vm.snapshot/revertTo in forge test are O(1) memory operations (true
revm), not RPC calls — this is the core speedup vs Anvil.
- recenterAccess is set in bootstrap so TWAP stability checks are bypassed
during attack sequences (mirrors the existing fitness.sh bootstrap).
- Test skips cleanly when BASE_RPC_URL is absent, keeping CI green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 11:54:41 +00:00
|
|
|
IDS_FILE="$MANIFEST_DIR/ids.txt"
|
|
|
|
|
BYTECODES_FILE="$MANIFEST_DIR/bytecodes.txt"
|
|
|
|
|
|
|
|
|
|
: > "$IDS_FILE"
|
|
|
|
|
: > "$BYTECODES_FILE"
|
|
|
|
|
|
|
|
|
|
COMPILED_COUNT=0
|
|
|
|
|
FAILED_IDS=""
|
2026-03-17 06:09:18 +00:00
|
|
|
FAILED_SCORES=""
|
|
|
|
|
|
|
|
|
|
# Emit a fitness=0 JSON line for a candidate that failed to compile, and track it.
|
|
|
|
|
skip_candidate() {
|
|
|
|
|
local cid="$1" reason="$2"
|
|
|
|
|
log "WARNING: $cid compile failed — scoring as 0"
|
|
|
|
|
local line='{"candidate_id":"'"$cid"'","fitness":0,"error":"'"$reason"'"}'
|
|
|
|
|
printf '%s\n' "$line"
|
|
|
|
|
FAILED_SCORES="${FAILED_SCORES:+$FAILED_SCORES$'\n'}$line"
|
|
|
|
|
FAILED_IDS="$FAILED_IDS $cid"
|
|
|
|
|
}
|
fix: feat: revm-based fitness evaluator for evolution at scale (#604)
Replace per-candidate Anvil+forge-script pipeline with in-process EVM
execution using Foundry's native revm backend, achieving 10-100× speedup
for evolutionary search at scale.
New files:
- onchain/test/FitnessEvaluator.t.sol — Forge test that forks Base once,
deploys the full KRAIKEN stack, then for each candidate uses vm.etch to
inject the compiled optimizer bytecode, UUPS-upgrades the proxy, runs all
attack sequences with in-memory vm.snapshot/revertTo (no RPC overhead),
and emits one {"candidate_id","fitness"} JSON line per candidate.
Skips gracefully when BASE_RPC_URL is unset (CI-safe).
- tools/push3-evolution/revm-evaluator/batch-eval.sh — Wrapper that
transpiles+compiles each candidate sequentially, writes a two-file
manifest (ids.txt + bytecodes.txt), then invokes FitnessEvaluator.t.sol
in a single forge test run and parses the score JSON from stdout.
Modified:
- tools/push3-evolution/evolve.sh — Adds EVAL_MODE env var (anvil|revm).
When EVAL_MODE=revm, batch-scores every candidate in a generation with
one batch-eval.sh call instead of N sequential fitness.sh processes;
scores are looked up from the JSONL output in the per-candidate loop.
Default remains EVAL_MODE=anvil for backward compatibility.
Key design decisions:
- Per-candidate Solidity compilation is unavoidable (each Push3 candidate
produces different Solidity); the speedup is in the evaluation phase.
- vm.snapshot/revertTo in forge test are O(1) memory operations (true
revm), not RPC calls — this is the core speedup vs Anvil.
- recenterAccess is set in bootstrap so TWAP stability checks are bypassed
during attack sequences (mirrors the existing fitness.sh bootstrap).
- Test skips cleanly when BASE_RPC_URL is absent, keeping CI green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 11:54:41 +00:00
|
|
|
|
|
|
|
|
for PUSH3_FILE in "${PUSH3_FILES[@]}"; do
|
|
|
|
|
PUSH3_FILE="$(cd "$(dirname "$PUSH3_FILE")" && pwd)/$(basename "$PUSH3_FILE")"
|
|
|
|
|
CANDIDATE_ID="$(basename "$PUSH3_FILE" .push3)"
|
|
|
|
|
|
2026-03-15 10:24:03 +01:00
|
|
|
# Transpile Push3 → Solidity, extract function body, inject into OptimizerV3.sol
|
|
|
|
|
INJECT_SCRIPT="$REPO_ROOT/tools/push3-transpiler/inject.sh"
|
|
|
|
|
if ! bash "$INJECT_SCRIPT" "$PUSH3_FILE" "$OPTIMIZERV3_SOL" >/dev/null 2>&1; then
|
2026-03-17 06:09:18 +00:00
|
|
|
skip_candidate "$CANDIDATE_ID" "transpile_failed"
|
fix: feat: revm-based fitness evaluator for evolution at scale (#604)
Replace per-candidate Anvil+forge-script pipeline with in-process EVM
execution using Foundry's native revm backend, achieving 10-100× speedup
for evolutionary search at scale.
New files:
- onchain/test/FitnessEvaluator.t.sol — Forge test that forks Base once,
deploys the full KRAIKEN stack, then for each candidate uses vm.etch to
inject the compiled optimizer bytecode, UUPS-upgrades the proxy, runs all
attack sequences with in-memory vm.snapshot/revertTo (no RPC overhead),
and emits one {"candidate_id","fitness"} JSON line per candidate.
Skips gracefully when BASE_RPC_URL is unset (CI-safe).
- tools/push3-evolution/revm-evaluator/batch-eval.sh — Wrapper that
transpiles+compiles each candidate sequentially, writes a two-file
manifest (ids.txt + bytecodes.txt), then invokes FitnessEvaluator.t.sol
in a single forge test run and parses the score JSON from stdout.
Modified:
- tools/push3-evolution/evolve.sh — Adds EVAL_MODE env var (anvil|revm).
When EVAL_MODE=revm, batch-scores every candidate in a generation with
one batch-eval.sh call instead of N sequential fitness.sh processes;
scores are looked up from the JSONL output in the per-candidate loop.
Default remains EVAL_MODE=anvil for backward compatibility.
Key design decisions:
- Per-candidate Solidity compilation is unavoidable (each Push3 candidate
produces different Solidity); the speedup is in the evaluation phase.
- vm.snapshot/revertTo in forge test are O(1) memory operations (true
revm), not RPC calls — this is the core speedup vs Anvil.
- recenterAccess is set in bootstrap so TWAP stability checks are bypassed
during attack sequences (mirrors the existing fitness.sh bootstrap).
- Test skips cleanly when BASE_RPC_URL is absent, keeping CI green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 11:54:41 +00:00
|
|
|
continue
|
|
|
|
|
fi
|
|
|
|
|
|
|
|
|
|
# Compile (forge's incremental build skips unchanged files quickly)
|
|
|
|
|
FORGE_EC=0
|
|
|
|
|
(cd "$ONCHAIN_DIR" && forge build --silent) >/dev/null 2>&1 || FORGE_EC=$?
|
|
|
|
|
|
|
|
|
|
if [ "$FORGE_EC" -ne 0 ]; then
|
2026-03-17 06:09:18 +00:00
|
|
|
skip_candidate "$CANDIDATE_ID" "compile_failed"
|
fix: feat: revm-based fitness evaluator for evolution at scale (#604)
Replace per-candidate Anvil+forge-script pipeline with in-process EVM
execution using Foundry's native revm backend, achieving 10-100× speedup
for evolutionary search at scale.
New files:
- onchain/test/FitnessEvaluator.t.sol — Forge test that forks Base once,
deploys the full KRAIKEN stack, then for each candidate uses vm.etch to
inject the compiled optimizer bytecode, UUPS-upgrades the proxy, runs all
attack sequences with in-memory vm.snapshot/revertTo (no RPC overhead),
and emits one {"candidate_id","fitness"} JSON line per candidate.
Skips gracefully when BASE_RPC_URL is unset (CI-safe).
- tools/push3-evolution/revm-evaluator/batch-eval.sh — Wrapper that
transpiles+compiles each candidate sequentially, writes a two-file
manifest (ids.txt + bytecodes.txt), then invokes FitnessEvaluator.t.sol
in a single forge test run and parses the score JSON from stdout.
Modified:
- tools/push3-evolution/evolve.sh — Adds EVAL_MODE env var (anvil|revm).
When EVAL_MODE=revm, batch-scores every candidate in a generation with
one batch-eval.sh call instead of N sequential fitness.sh processes;
scores are looked up from the JSONL output in the per-candidate loop.
Default remains EVAL_MODE=anvil for backward compatibility.
Key design decisions:
- Per-candidate Solidity compilation is unavoidable (each Push3 candidate
produces different Solidity); the speedup is in the evaluation phase.
- vm.snapshot/revertTo in forge test are O(1) memory operations (true
revm), not RPC calls — this is the core speedup vs Anvil.
- recenterAccess is set in bootstrap so TWAP stability checks are bypassed
during attack sequences (mirrors the existing fitness.sh bootstrap).
- Test skips cleanly when BASE_RPC_URL is absent, keeping CI green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 11:54:41 +00:00
|
|
|
continue
|
|
|
|
|
fi
|
|
|
|
|
|
|
|
|
|
# Extract bytecode from artifact (strip leading 0x if present)
|
|
|
|
|
BYTECODE_HEX="$(python3 - "$ARTIFACT_PATH" <<'PYEOF'
|
|
|
|
|
import json, sys
|
|
|
|
|
with open(sys.argv[1]) as f:
|
|
|
|
|
d = json.load(f)
|
2026-03-12 19:54:58 +00:00
|
|
|
bytecode = d["deployedBytecode"]["object"]
|
fix: feat: revm-based fitness evaluator for evolution at scale (#604)
Replace per-candidate Anvil+forge-script pipeline with in-process EVM
execution using Foundry's native revm backend, achieving 10-100× speedup
for evolutionary search at scale.
New files:
- onchain/test/FitnessEvaluator.t.sol — Forge test that forks Base once,
deploys the full KRAIKEN stack, then for each candidate uses vm.etch to
inject the compiled optimizer bytecode, UUPS-upgrades the proxy, runs all
attack sequences with in-memory vm.snapshot/revertTo (no RPC overhead),
and emits one {"candidate_id","fitness"} JSON line per candidate.
Skips gracefully when BASE_RPC_URL is unset (CI-safe).
- tools/push3-evolution/revm-evaluator/batch-eval.sh — Wrapper that
transpiles+compiles each candidate sequentially, writes a two-file
manifest (ids.txt + bytecodes.txt), then invokes FitnessEvaluator.t.sol
in a single forge test run and parses the score JSON from stdout.
Modified:
- tools/push3-evolution/evolve.sh — Adds EVAL_MODE env var (anvil|revm).
When EVAL_MODE=revm, batch-scores every candidate in a generation with
one batch-eval.sh call instead of N sequential fitness.sh processes;
scores are looked up from the JSONL output in the per-candidate loop.
Default remains EVAL_MODE=anvil for backward compatibility.
Key design decisions:
- Per-candidate Solidity compilation is unavoidable (each Push3 candidate
produces different Solidity); the speedup is in the evaluation phase.
- vm.snapshot/revertTo in forge test are O(1) memory operations (true
revm), not RPC calls — this is the core speedup vs Anvil.
- recenterAccess is set in bootstrap so TWAP stability checks are bypassed
during attack sequences (mirrors the existing fitness.sh bootstrap).
- Test skips cleanly when BASE_RPC_URL is absent, keeping CI green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 11:54:41 +00:00
|
|
|
# Ensure 0x prefix
|
|
|
|
|
if not bytecode.startswith("0x"):
|
|
|
|
|
bytecode = "0x" + bytecode
|
|
|
|
|
print(bytecode)
|
|
|
|
|
PYEOF
|
2026-03-17 06:09:18 +00:00
|
|
|
)" || { skip_candidate "$CANDIDATE_ID" "bytecode_extract_failed"; continue; }
|
fix: feat: revm-based fitness evaluator for evolution at scale (#604)
Replace per-candidate Anvil+forge-script pipeline with in-process EVM
execution using Foundry's native revm backend, achieving 10-100× speedup
for evolutionary search at scale.
New files:
- onchain/test/FitnessEvaluator.t.sol — Forge test that forks Base once,
deploys the full KRAIKEN stack, then for each candidate uses vm.etch to
inject the compiled optimizer bytecode, UUPS-upgrades the proxy, runs all
attack sequences with in-memory vm.snapshot/revertTo (no RPC overhead),
and emits one {"candidate_id","fitness"} JSON line per candidate.
Skips gracefully when BASE_RPC_URL is unset (CI-safe).
- tools/push3-evolution/revm-evaluator/batch-eval.sh — Wrapper that
transpiles+compiles each candidate sequentially, writes a two-file
manifest (ids.txt + bytecodes.txt), then invokes FitnessEvaluator.t.sol
in a single forge test run and parses the score JSON from stdout.
Modified:
- tools/push3-evolution/evolve.sh — Adds EVAL_MODE env var (anvil|revm).
When EVAL_MODE=revm, batch-scores every candidate in a generation with
one batch-eval.sh call instead of N sequential fitness.sh processes;
scores are looked up from the JSONL output in the per-candidate loop.
Default remains EVAL_MODE=anvil for backward compatibility.
Key design decisions:
- Per-candidate Solidity compilation is unavoidable (each Push3 candidate
produces different Solidity); the speedup is in the evaluation phase.
- vm.snapshot/revertTo in forge test are O(1) memory operations (true
revm), not RPC calls — this is the core speedup vs Anvil.
- recenterAccess is set in bootstrap so TWAP stability checks are bypassed
during attack sequences (mirrors the existing fitness.sh bootstrap).
- Test skips cleanly when BASE_RPC_URL is absent, keeping CI green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 11:54:41 +00:00
|
|
|
|
|
|
|
|
if [ -z "$BYTECODE_HEX" ] || [ "$BYTECODE_HEX" = "0x" ]; then
|
2026-03-17 06:09:18 +00:00
|
|
|
skip_candidate "$CANDIDATE_ID" "empty_bytecode"
|
fix: feat: revm-based fitness evaluator for evolution at scale (#604)
Replace per-candidate Anvil+forge-script pipeline with in-process EVM
execution using Foundry's native revm backend, achieving 10-100× speedup
for evolutionary search at scale.
New files:
- onchain/test/FitnessEvaluator.t.sol — Forge test that forks Base once,
deploys the full KRAIKEN stack, then for each candidate uses vm.etch to
inject the compiled optimizer bytecode, UUPS-upgrades the proxy, runs all
attack sequences with in-memory vm.snapshot/revertTo (no RPC overhead),
and emits one {"candidate_id","fitness"} JSON line per candidate.
Skips gracefully when BASE_RPC_URL is unset (CI-safe).
- tools/push3-evolution/revm-evaluator/batch-eval.sh — Wrapper that
transpiles+compiles each candidate sequentially, writes a two-file
manifest (ids.txt + bytecodes.txt), then invokes FitnessEvaluator.t.sol
in a single forge test run and parses the score JSON from stdout.
Modified:
- tools/push3-evolution/evolve.sh — Adds EVAL_MODE env var (anvil|revm).
When EVAL_MODE=revm, batch-scores every candidate in a generation with
one batch-eval.sh call instead of N sequential fitness.sh processes;
scores are looked up from the JSONL output in the per-candidate loop.
Default remains EVAL_MODE=anvil for backward compatibility.
Key design decisions:
- Per-candidate Solidity compilation is unavoidable (each Push3 candidate
produces different Solidity); the speedup is in the evaluation phase.
- vm.snapshot/revertTo in forge test are O(1) memory operations (true
revm), not RPC calls — this is the core speedup vs Anvil.
- recenterAccess is set in bootstrap so TWAP stability checks are bypassed
during attack sequences (mirrors the existing fitness.sh bootstrap).
- Test skips cleanly when BASE_RPC_URL is absent, keeping CI green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 11:54:41 +00:00
|
|
|
continue
|
|
|
|
|
fi
|
|
|
|
|
|
|
|
|
|
printf '%s\n' "$CANDIDATE_ID" >> "$IDS_FILE"
|
|
|
|
|
printf '%s\n' "$BYTECODE_HEX" >> "$BYTECODES_FILE"
|
|
|
|
|
COMPILED_COUNT=$((COMPILED_COUNT + 1))
|
|
|
|
|
log "Compiled $CANDIDATE_ID"
|
|
|
|
|
done
|
|
|
|
|
|
|
|
|
|
if [ "$COMPILED_COUNT" -eq 0 ]; then
|
|
|
|
|
fail2 "No candidates compiled successfully — aborting"
|
|
|
|
|
fi
|
|
|
|
|
|
|
|
|
|
log "Compiled $COMPILED_COUNT / ${#PUSH3_FILES[@]} candidates"
|
|
|
|
|
|
|
|
|
|
# =============================================================================
|
|
|
|
|
# Step 3 — Run FitnessEvaluator.t.sol (in-process revm, all candidates at once)
|
|
|
|
|
# =============================================================================
|
|
|
|
|
|
|
|
|
|
ATTACKS_DIR="${ATTACKS_DIR:-$DEFAULT_ATTACKS_DIR}"
|
|
|
|
|
|
|
|
|
|
log "Running FitnessEvaluator.t.sol (in-process revm, fork: $BASE_RPC_URL)…"
|
|
|
|
|
|
|
|
|
|
FORGE_TEST_EC=0
|
|
|
|
|
FORGE_OUTPUT="$(
|
|
|
|
|
cd "$ONCHAIN_DIR"
|
|
|
|
|
BASE_RPC_URL="$BASE_RPC_URL" \
|
|
|
|
|
FITNESS_MANIFEST_DIR="$MANIFEST_DIR" \
|
|
|
|
|
ATTACKS_DIR="$ATTACKS_DIR" \
|
|
|
|
|
forge test \
|
|
|
|
|
--match-contract FitnessEvaluator \
|
|
|
|
|
--match-test testBatchEvaluate \
|
|
|
|
|
-vv \
|
|
|
|
|
--no-match-path "NOT_A_REAL_PATH" \
|
|
|
|
|
2>&1
|
|
|
|
|
)" || FORGE_TEST_EC=$?
|
|
|
|
|
|
|
|
|
|
if [ "$FORGE_TEST_EC" -ne 0 ]; then
|
|
|
|
|
# Surface forge output on failure for diagnosis
|
|
|
|
|
printf '%s\n' "$FORGE_OUTPUT" >&2
|
|
|
|
|
fail2 "forge test failed (exit $FORGE_TEST_EC)"
|
|
|
|
|
fi
|
|
|
|
|
|
|
|
|
|
# =============================================================================
|
|
|
|
|
# Step 4 — Extract and emit score JSON lines
|
|
|
|
|
#
|
|
|
|
|
# forge test -vv wraps console.log output with leading spaces and a "Logs:" header.
|
|
|
|
|
# We grep for lines containing the score JSON pattern and strip the indentation.
|
|
|
|
|
# =============================================================================
|
|
|
|
|
|
|
|
|
|
SCORES_JSONL="$(printf '%s\n' "$FORGE_OUTPUT" | grep -E '"candidate_id"' | sed 's/^[[:space:]]*//' || true)"
|
|
|
|
|
|
|
|
|
|
if [ -z "$SCORES_JSONL" ]; then
|
|
|
|
|
printf '%s\n' "$FORGE_OUTPUT" >&2
|
|
|
|
|
fail2 "No score lines found in forge test output"
|
|
|
|
|
fi
|
|
|
|
|
|
2026-03-17 06:09:18 +00:00
|
|
|
# Emit scores to stdout (failed candidates already emitted individually above)
|
fix: feat: revm-based fitness evaluator for evolution at scale (#604)
Replace per-candidate Anvil+forge-script pipeline with in-process EVM
execution using Foundry's native revm backend, achieving 10-100× speedup
for evolutionary search at scale.
New files:
- onchain/test/FitnessEvaluator.t.sol — Forge test that forks Base once,
deploys the full KRAIKEN stack, then for each candidate uses vm.etch to
inject the compiled optimizer bytecode, UUPS-upgrades the proxy, runs all
attack sequences with in-memory vm.snapshot/revertTo (no RPC overhead),
and emits one {"candidate_id","fitness"} JSON line per candidate.
Skips gracefully when BASE_RPC_URL is unset (CI-safe).
- tools/push3-evolution/revm-evaluator/batch-eval.sh — Wrapper that
transpiles+compiles each candidate sequentially, writes a two-file
manifest (ids.txt + bytecodes.txt), then invokes FitnessEvaluator.t.sol
in a single forge test run and parses the score JSON from stdout.
Modified:
- tools/push3-evolution/evolve.sh — Adds EVAL_MODE env var (anvil|revm).
When EVAL_MODE=revm, batch-scores every candidate in a generation with
one batch-eval.sh call instead of N sequential fitness.sh processes;
scores are looked up from the JSONL output in the per-candidate loop.
Default remains EVAL_MODE=anvil for backward compatibility.
Key design decisions:
- Per-candidate Solidity compilation is unavoidable (each Push3 candidate
produces different Solidity); the speedup is in the evaluation phase.
- vm.snapshot/revertTo in forge test are O(1) memory operations (true
revm), not RPC calls — this is the core speedup vs Anvil.
- recenterAccess is set in bootstrap so TWAP stability checks are bypassed
during attack sequences (mirrors the existing fitness.sh bootstrap).
- Test skips cleanly when BASE_RPC_URL is absent, keeping CI green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 11:54:41 +00:00
|
|
|
printf '%s\n' "$SCORES_JSONL"
|
|
|
|
|
|
2026-03-17 06:09:18 +00:00
|
|
|
# Optionally write to output directory (merge successful + failed scores)
|
fix: feat: revm-based fitness evaluator for evolution at scale (#604)
Replace per-candidate Anvil+forge-script pipeline with in-process EVM
execution using Foundry's native revm backend, achieving 10-100× speedup
for evolutionary search at scale.
New files:
- onchain/test/FitnessEvaluator.t.sol — Forge test that forks Base once,
deploys the full KRAIKEN stack, then for each candidate uses vm.etch to
inject the compiled optimizer bytecode, UUPS-upgrades the proxy, runs all
attack sequences with in-memory vm.snapshot/revertTo (no RPC overhead),
and emits one {"candidate_id","fitness"} JSON line per candidate.
Skips gracefully when BASE_RPC_URL is unset (CI-safe).
- tools/push3-evolution/revm-evaluator/batch-eval.sh — Wrapper that
transpiles+compiles each candidate sequentially, writes a two-file
manifest (ids.txt + bytecodes.txt), then invokes FitnessEvaluator.t.sol
in a single forge test run and parses the score JSON from stdout.
Modified:
- tools/push3-evolution/evolve.sh — Adds EVAL_MODE env var (anvil|revm).
When EVAL_MODE=revm, batch-scores every candidate in a generation with
one batch-eval.sh call instead of N sequential fitness.sh processes;
scores are looked up from the JSONL output in the per-candidate loop.
Default remains EVAL_MODE=anvil for backward compatibility.
Key design decisions:
- Per-candidate Solidity compilation is unavoidable (each Push3 candidate
produces different Solidity); the speedup is in the evaluation phase.
- vm.snapshot/revertTo in forge test are O(1) memory operations (true
revm), not RPC calls — this is the core speedup vs Anvil.
- recenterAccess is set in bootstrap so TWAP stability checks are bypassed
during attack sequences (mirrors the existing fitness.sh bootstrap).
- Test skips cleanly when BASE_RPC_URL is absent, keeping CI green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 11:54:41 +00:00
|
|
|
if [ -n "$OUTPUT_DIR" ]; then
|
|
|
|
|
mkdir -p "$OUTPUT_DIR"
|
2026-03-17 06:09:18 +00:00
|
|
|
ALL_SCORES="${SCORES_JSONL}${FAILED_SCORES:+$'\n'$FAILED_SCORES}"
|
|
|
|
|
printf '%s\n' "$ALL_SCORES" > "$OUTPUT_DIR/scores.jsonl"
|
fix: feat: revm-based fitness evaluator for evolution at scale (#604)
Replace per-candidate Anvil+forge-script pipeline with in-process EVM
execution using Foundry's native revm backend, achieving 10-100× speedup
for evolutionary search at scale.
New files:
- onchain/test/FitnessEvaluator.t.sol — Forge test that forks Base once,
deploys the full KRAIKEN stack, then for each candidate uses vm.etch to
inject the compiled optimizer bytecode, UUPS-upgrades the proxy, runs all
attack sequences with in-memory vm.snapshot/revertTo (no RPC overhead),
and emits one {"candidate_id","fitness"} JSON line per candidate.
Skips gracefully when BASE_RPC_URL is unset (CI-safe).
- tools/push3-evolution/revm-evaluator/batch-eval.sh — Wrapper that
transpiles+compiles each candidate sequentially, writes a two-file
manifest (ids.txt + bytecodes.txt), then invokes FitnessEvaluator.t.sol
in a single forge test run and parses the score JSON from stdout.
Modified:
- tools/push3-evolution/evolve.sh — Adds EVAL_MODE env var (anvil|revm).
When EVAL_MODE=revm, batch-scores every candidate in a generation with
one batch-eval.sh call instead of N sequential fitness.sh processes;
scores are looked up from the JSONL output in the per-candidate loop.
Default remains EVAL_MODE=anvil for backward compatibility.
Key design decisions:
- Per-candidate Solidity compilation is unavoidable (each Push3 candidate
produces different Solidity); the speedup is in the evaluation phase.
- vm.snapshot/revertTo in forge test are O(1) memory operations (true
revm), not RPC calls — this is the core speedup vs Anvil.
- recenterAccess is set in bootstrap so TWAP stability checks are bypassed
during attack sequences (mirrors the existing fitness.sh bootstrap).
- Test skips cleanly when BASE_RPC_URL is absent, keeping CI green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 11:54:41 +00:00
|
|
|
log "Scores written to $OUTPUT_DIR/scores.jsonl"
|
|
|
|
|
fi
|
|
|
|
|
|
2026-03-17 06:09:18 +00:00
|
|
|
# Warn summary if any candidates were skipped (compile failures)
|
fix: feat: revm-based fitness evaluator for evolution at scale (#604)
Replace per-candidate Anvil+forge-script pipeline with in-process EVM
execution using Foundry's native revm backend, achieving 10-100× speedup
for evolutionary search at scale.
New files:
- onchain/test/FitnessEvaluator.t.sol — Forge test that forks Base once,
deploys the full KRAIKEN stack, then for each candidate uses vm.etch to
inject the compiled optimizer bytecode, UUPS-upgrades the proxy, runs all
attack sequences with in-memory vm.snapshot/revertTo (no RPC overhead),
and emits one {"candidate_id","fitness"} JSON line per candidate.
Skips gracefully when BASE_RPC_URL is unset (CI-safe).
- tools/push3-evolution/revm-evaluator/batch-eval.sh — Wrapper that
transpiles+compiles each candidate sequentially, writes a two-file
manifest (ids.txt + bytecodes.txt), then invokes FitnessEvaluator.t.sol
in a single forge test run and parses the score JSON from stdout.
Modified:
- tools/push3-evolution/evolve.sh — Adds EVAL_MODE env var (anvil|revm).
When EVAL_MODE=revm, batch-scores every candidate in a generation with
one batch-eval.sh call instead of N sequential fitness.sh processes;
scores are looked up from the JSONL output in the per-candidate loop.
Default remains EVAL_MODE=anvil for backward compatibility.
Key design decisions:
- Per-candidate Solidity compilation is unavoidable (each Push3 candidate
produces different Solidity); the speedup is in the evaluation phase.
- vm.snapshot/revertTo in forge test are O(1) memory operations (true
revm), not RPC calls — this is the core speedup vs Anvil.
- recenterAccess is set in bootstrap so TWAP stability checks are bypassed
during attack sequences (mirrors the existing fitness.sh bootstrap).
- Test skips cleanly when BASE_RPC_URL is absent, keeping CI green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 11:54:41 +00:00
|
|
|
if [ -n "$FAILED_IDS" ]; then
|
|
|
|
|
log "WARNING: the following candidates were skipped (compile failed): $FAILED_IDS"
|
|
|
|
|
exit 1
|
|
|
|
|
fi
|
|
|
|
|
|
|
|
|
|
log "Done — scored $COMPILED_COUNT candidates"
|