- Change WARNING to explicitly state "legacy CID format ... migration not supported, skipping"
- Expand comment near the startswith('candidate_') guard to document the CID format
contract and explain why re-admission is intentionally out of scope (no surviving
generation_N.jsonl files from runs 1-6 exist in the repo)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Pass seed basename into the admission Python block as argv[7]
- Add \`note\` field to every new evolved entry: "Evolved from <seed> (run<N> gen<G>)"
- Add migration comment noting entries admitted before this fix may have note: null
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace unquoted heredoc (shell-injection path) with a temp file: the
shell loop now appends tab-separated filename/score lines to a temp
file, which is passed as a plain path argument to the Python manifest-
rewrite block. Python reads only file contents, never executes shell-
expanded strings.
- Add early abort on fitness.sh exit code 2 (infra error: Anvil down,
missing tool). Iterating past an infra failure produces no useful
results; aborting immediately surfaces the real problem.
- Remove unused `os` import from the manifest-rewrite Python block.
- Fix inaccurate comment in evolve.sh --diverse-seeds sampling: the pool
sampler does a flat random shuffle with no fitness weighting; null-
fitness seeds are not "treated as 0" — they are sampled with equal
probability to any other seed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add evaluate-seeds.sh: standalone script that reads manifest.jsonl,
finds every entry with fitness: null, runs fitness.sh against each
seed file, and atomically writes results back to manifest.jsonl.
Supports --dry-run to preview without evaluating.
- Add comment to --diverse-seeds sampling in evolve.sh documenting that
null-fitness seeds are included with effective_fitness=0 and that
evaluate-seeds.sh should be run to score them.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Define ZERO_RATED_FLAGS set near effective_fitness and check each flag
with any(...in flags...) instead of a single hard-coded substring test.
token_value_inflation behaviour is preserved; new flags can be added to
the set without touching the dispatch logic.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace AW=250 (VERY AGGRESSIVE) with 100 and AW=150 (AGGRESSIVE) with 80
so neither value is silently clamped by LiquidityManager.MAX_ANCHOR_WIDTH=100.
Update header comment block to match the corrected values.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Stream evolve.sh output directly to stderr instead of buffering in a
command substitution; long runs (tens of minutes) are now visible live.
- Use an array (EVOLVE_ARGS) for evolve.sh arguments instead of an
unquoted DIVERSE_FLAG string variable.
- Abort the current run (continue to next loop iteration) when the patch
fails to apply, rather than silently running with wrong evaluation semantics.
- Fix notify() to pass the message via stdin to avoid SSH single-quote
interpolation breakage on messages containing special characters.
- Fix step comment/counter mismatch: "Step 7" comment now reads "Step 6"
to match the [6/7] log label for the summary-write step.
- Clarify in evolution.conf that GAS_LIMIT and ANCHOR_WIDTH_UNBOUNDED are
documentation-only (they document what evolution.patch does); editing
them has no runtime effect.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace `mktemp -d` with a fixed working directory `evolved/.work/` that
is wiped at startup. Stale `/tmp/tmp.*` directories from killed runs can
no longer interfere with batch-eval.sh path resolution. Run outputs are
already preserved in `evolved/run_NNN/` before the work dir is cleaned.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
evo_run007_champion: fitness 7.117e21, anchorWidth=153 (unbounded),
discoveryDepth=0. Simplified to single percentageStaked>88% threshold.
Evolved under IL crystallization attack pressure.
Recovered from reflog after rebase accident destroyed PRs #692, #699.
Balanced Adaptive (#688) was garbage collected — will be regenerated.
Kindergarten (#683) needs fresh implementation due to evolve.sh conflicts.
Closes#672, #675.
- Fix run_NNN scan regex: r'run(\d+)' → r'run_(\d+)' so it correctly
matches the underscore-separated directory names the script creates
(previously always resolved to 001, overwriting the same dir each run)
- Remove [in-progress] tag from STATE.md entry for #752
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- --output now accepts a base dir (default: evolved/) instead of requiring
an explicit path each run
- On each invocation, scan base dir for existing run_NNN/ subdirectories,
find the highest N, and create run_(N+1)/ for this run's outputs
- All generation JSONL files, best.push3, diff.txt, and evolution.log are
written to the new run dir — previous runs are never overwritten
- Log header now shows both Base dir and Output (run dir) for clarity
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add fitness_flags="token_value_inflation" to evo_run004_champion in
manifest.jsonl so callers can detect the inflated value without
discarding the entry entirely.
- Add effective_fitness() helper in evolve.sh pool admission (step 5)
that returns 0 for any entry with a token_value_inflation flag,
preventing inflated scores from biasing the top-100 evolved pool
ranking or eviction decisions.
- Document in evolve.sh that raw fitness values are only comparable
within the same evaluation run.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fixes#667
## Changes
## Summary
Implemented persistent top-100 candidate pool in `tools/push3-evolution/evolve.sh`:
### Changes
**`--run-id <N>` flag** (line 96)
- Optional integer; auto-increments from highest `run` field in `manifest.jsonl` when omitted
- Zero-padded to 3 digits (`001`, `002`, …)
**Seeds pool constants** (after path canonicalization)
- `SEEDS_DIR` → `$SCRIPT_DIR/seeds/`
- `POOL_MANIFEST` → `seeds/manifest.jsonl`
- `ADMISSION_THRESHOLD` → `6000000000000000000000` (6e21 wei)
**`--diverse-seeds` mode** now has two paths:
1. **Pool mode** (pool non-empty): random-shuffles the pool and takes up to `POPULATION` candidates — real evolved diversity, not parametric clones
2. **Fallback** (pool empty): original `seed-gen-cli` parametric variant behavior
- Both paths fall back to mutating `--seed` to fill any shortfall
**Step 5 — End-of-run admission** (after the diff step):
1. Scans all `generation_*.jsonl` in `OUTPUT_DIR` for candidates with `fitness ≥ 6e21`
2. Maps `candidate_id` (e.g. `gen2_c005`) back to `.push3` files in `WORK_DIR` (still exists since cleanup fires on EXIT)
3. Deduplicates by SHA-256 content hash against existing pool
4. Names new files `run{RUN_ID}_gen{N}_c{MMM}.push3`
5. Merges with existing pool, sorts by fitness descending, keeps top 100
6. Copies admitted files to `seeds/`, removes evicted evolved files (never hand-written), rewrites `manifest.jsonl`
Co-authored-by: openhands <openhands@all-hands.dev>
Reviewed-on: https://codeberg.org/johba/harb/pulls/683
Reviewed-by: review_bot <review_bot@noreply.codeberg.org>
## Three bugs in evolve.sh
1. **Heredoc stdin conflict** — `py_stats()` used `<<PYEOF` heredoc which stole stdin from the pipe, so python never received score values → stats always `min=0 max=0 mean=0`
2. **Bash integer overflow** — global best comparison used `[ $MAX -gt $GLOBAL_BEST_FITNESS ]` which overflows on uint256 wei values (>9.2e18) → best always tracked as 0
3. **candidate_id mismatch** — evolve.sh looked up `gen0_c000` but batch-eval produces `candidate_000` (derived from filename) → score lookup always returned default 0
All 3 previous evolution runs (150+ candidates) reported all zeros despite batch-eval correctly scoring them at ~8.26e21 wei.
## Fix
- `py_stats`: heredoc → `python3 -c` inline
- Global best: bash `[ -gt ]` → `python3` big number comparison
- Score lookup: use `basename $CAND_FILE` instead of synthetic CID
Co-authored-by: root <root@debian-g-2vcpu-8gb-ams3-01>
Reviewed-on: https://codeberg.org/johba/harb/pulls/665
Reviewed-by: review_bot <review_bot@noreply.codeberg.org>
- evolve.sh: fix fail-in-subshell bug — run seed-gen-cli as a direct
command so its exit code is checked by the parent shell and fail()
aborts the script correctly; redirect stderr to log file instead of
discarding it with 2>/dev/null
- seed-generator.ts: reorder enumerateVariants() to put
STAKED_THRESHOLDS outermost (192 entries/block) so that
selectVariants(6) with stride=192 covers all 6 staked% thresholds;
remove false doc claim about "first variant is current seed config";
add comments explaining CI=0n is intentional in all presets
- seed-gen-cli.ts: emit a stderr diagnostic when count exceeds the
1152-variant cap so the cap is visible rather than silently producing
fewer files than requested
- test: strengthen n=6 test to assert all STAKED_THRESHOLDS values are
represented in the selected variants
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add seed-generator.ts module and seed-gen-cli.ts CLI that produce
parametric Push3 variants for initial population seeding.
Variants systematically cover:
- Staked% thresholds: 80, 85, 88, 91, 94, 97
- Penalty thresholds: 30, 50, 70, 100
- Bull params: 4 presets (aggressive → mild)
- Bear params: 4 presets (standard → very mild)
- Tax distributions: exponential (seed), linear, sqrt
Total combination space: 6×4×4×4×3 = 1152 variants.
selectVariants(n) samples evenly so every axis is represented.
evolve.sh gains --diverse-seeds flag: when set, gen_0 is seeded with
parametric variants instead of N copies of the same mutated seed.
Remaining slots (if population > generated variants) fall back to
mutations of the base seed.
All generated programs pass transpiler stack validation (33 new tests).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Optimizer.sol: move CALCULATE_PARAMS_GAS_LIMIT constant to top of
contract (after error declaration) to avoid mid-contract placement.
Expand natspec with EIP-150 63/64 note: callers need ~203 175 gas to
deliver the full 200 000 budget to the inner staticcall.
- Optimizer.sol: add ret.length < 128 guard before abi.decode in
getLiquidityParams(). Malformed return data (truncated / wrong ABI)
from an evolved program now falls back to _bearDefaults() instead of
propagating an unhandled revert. The 128-byte minimum is the ABI
encoding of (uint256, uint256, uint24, uint256) — four 32-byte slots.
- Optimizer.sol: add cross-reference comment to _bearDefaults() noting
that its values must stay in sync with LiquidityManager.recenter()'s
catch block to prevent silent divergence.
- FitnessEvaluator.t.sol: add CALCULATE_PARAMS_GAS_LIMIT mirror constant
(must match Optimizer.sol). Disqualify candidates whose measured gas
exceeds the production cap with fitness=0 and error="gas_over_limit"
— prevents the pipeline from selecting programs that are functionally
dead on-chain (would always produce bear defaults in production).
- batch-eval.sh: update output format comment to document the gas_used
field and over-gas-limit error object added by this feature.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Skip UUPS upgradeTo: etch + vm.store ERC1967 implementation slot directly
(OptimizerV3Push3 is standalone, no UUPS inheritance needed for evolution)
- Use deployedBytecode (runtime) instead of bytecode (creation) for vm.etch
- Inject transpiled body into OptimizerV3.sol (has getLiquidityParams via Optimizer)
instead of using standalone OptimizerV3Push3.sol
- Wrap buy/sell/stake/unstake in try/catch — attack ops should not abort the batch
- Add /tmp read to fs_permissions for batch-eval manifest files
- Bootstrap recenter returns bool instead of reverting (soft-fail per candidate)
Replace per-candidate Anvil+forge-script pipeline with in-process EVM
execution using Foundry's native revm backend, achieving 10-100× speedup
for evolutionary search at scale.
New files:
- onchain/test/FitnessEvaluator.t.sol — Forge test that forks Base once,
deploys the full KRAIKEN stack, then for each candidate uses vm.etch to
inject the compiled optimizer bytecode, UUPS-upgrades the proxy, runs all
attack sequences with in-memory vm.snapshot/revertTo (no RPC overhead),
and emits one {"candidate_id","fitness"} JSON line per candidate.
Skips gracefully when BASE_RPC_URL is unset (CI-safe).
- tools/push3-evolution/revm-evaluator/batch-eval.sh — Wrapper that
transpiles+compiles each candidate sequentially, writes a two-file
manifest (ids.txt + bytecodes.txt), then invokes FitnessEvaluator.t.sol
in a single forge test run and parses the score JSON from stdout.
Modified:
- tools/push3-evolution/evolve.sh — Adds EVAL_MODE env var (anvil|revm).
When EVAL_MODE=revm, batch-scores every candidate in a generation with
one batch-eval.sh call instead of N sequential fitness.sh processes;
scores are looked up from the JSONL output in the per-candidate loop.
Default remains EVAL_MODE=anvil for backward compatibility.
Key design decisions:
- Per-candidate Solidity compilation is unavoidable (each Push3 candidate
produces different Solidity); the speedup is in the evaluation phase.
- vm.snapshot/revertTo in forge test are O(1) memory operations (true
revm), not RPC calls — this is the core speedup vs Anvil.
- recenterAccess is set in bootstrap so TWAP stability checks are bypassed
during attack sequences (mirrors the existing fitness.sh bootstrap).
- Test skips cleanly when BASE_RPC_URL is absent, keeping CI green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add virtual to Optimizer.calculateParams() for UUPS override
- Create OptimizerV3.sol: UUPS-upgradeable optimizer with transpiled Push3 logic
- Update deploy-optimizer.sh to deploy OptimizerV3 instead of Optimizer
- Add ~/.foundry/bin to PATH in evolve.sh, fitness.sh, deploy-optimizer.sh
Address round-2 review findings:
- Move BASELINE_SNAP before deploy-optimizer.sh so cleanup fully reverts the
deploy on a shared Anvil; fixes nonce/address collision when a second
sequential evaluation reuses the same chain
- Revert deploy output to capture-and-suppress on success / surface on failure;
removes per-candidate stderr noise in evolution loop batch runs
- Fix cast rpc anvil_mine arg order to match all other cast rpc calls in script
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>