johba/harb - Forgejo: Beyond coding. We forge.

johba/harb

Author	SHA1	Message	Date
openhands	cb6e6708b6	fix: \`llm\`-origin entries in manifest have null fitness and no evaluation path (#724 ) - Add evaluate-seeds.sh: standalone script that reads manifest.jsonl, finds every entry with fitness: null, runs fitness.sh against each seed file, and atomically writes results back to manifest.jsonl. Supports --dry-run to preview without evaluating. - Add comment to --diverse-seeds sampling in evolve.sh documenting that null-fitness seeds are included with effective_fitness=0 and that evaluate-seeds.sh should be run to score them. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-15 03:08:29 +00:00
openhands	273615cfed	fix: No generic flag dispatch: only \`token_value_inflation\` is ever zero-rated (#723 ) Define ZERO_RATED_FLAGS set near effective_fitness and check each flag with any(...in flags...) instead of a single hard-coded substring test. token_value_inflation behaviour is preserved; new flags can be added to the set without touching the dispatch logic. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-15 02:36:57 +00:00
openhands	ab40930812	fix: fitness.sh individual-scoring path still silences errors (#766 )	2026-03-14 19:07:17 +00:00
openhands	f355974cc8	fix: fix: evolve.sh silences all batch-eval errors with 2>/dev/null (#749 ) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-14 16:51:04 +00:00
openhands	89a9d3e575	fix: fix: evolve.sh silences all batch-eval errors with 2>/dev/null (#749 ) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-14 16:27:09 +00:00
openhands	b168a05930	fix: fix: evolve.sh stale tmpdirs break subsequent runs (#750 ) Replace `mktemp -d` with a fixed working directory `evolved/.work/` that is wiped at startup. Stale `/tmp/tmp.*` directories from killed runs can no longer interfere with batch-eval.sh path resolution. Run outputs are already preserved in `evolved/run_NNN/` before the work dir is cleaned. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-14 15:48:07 +00:00
openhands	cd86774ac8	fix: address review findings for #751 — STATE.md and script header docs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-14 12:17:23 +00:00
openhands	83ab1683f5	fix: fix: EVAL_MODE defaults to anvil — should default to revm (#751 ) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-14 11:56:52 +00:00
openhands	266500fde1	fix: address review findings for #752 — regex and STATE.md cleanup - Fix run_NNN scan regex: r'run(\d+)' → r'run_(\d+)' so it correctly matches the underscore-separated directory names the script creates (previously always resolved to 001, overwriting the same dir each run) - Remove [in-progress] tag from STATE.md entry for #752 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-14 11:27:53 +00:00
openhands	b5bf53b010	fix: feat: evolve.sh auto-incrementing per-run results directory (#752 ) - --output now accepts a base dir (default: evolved/) instead of requiring an explicit path each run - On each invocation, scan base dir for existing run_NNN/ subdirectories, find the highest N, and create run_(N+1)/ for this run's outputs - All generation JSONL files, best.push3, diff.txt, and evolution.log are written to the new run dir — previous runs are never overwritten - Log header now shows both Base dir and Output (run dir) for clarity Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-14 11:08:04 +00:00
openhands	b6c07b1d93	fix: generation_N.jsonl candidate_id format mismatch vs filenames (#669 )	2026-03-14 04:27:59 +00:00
openhands	0aa819f168	fix: generation_N.jsonl candidate_id format mismatch vs filenames (#669 )	2026-03-14 04:07:00 +00:00
openhands	c42a1ca768	fix: evo_run004_champion fitness inflated by token value (#670 ) (#704 ) - Add fitness_flags="token_value_inflation" to evo_run004_champion in manifest.jsonl so callers can detect the inflated value without discarding the entry entirely. - Add effective_fitness() helper in evolve.sh pool admission (step 5) that returns 0 for any entry with a token_value_inflation flag, preventing inflated scores from biasing the top-100 evolved pool ranking or eviction decisions. - Document in evolve.sh that raw fitness values are only comparable within the same evaluation run. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-14 01:08:13 +00:00
openhands	b7d0b63ca1	fix: int(e.get('fitness', 0)) crashes on null-fitness manifest entries (#711 )	2026-03-13 23:37:00 +00:00
johba	0c4cd23dfa	fix: feat: Seed kindergarten — persistent top-100 candidate pool (#667 ) (#683 ) Fixes #667 ## Changes ## Summary Implemented persistent top-100 candidate pool in `tools/push3-evolution/evolve.sh`: ### Changes `--run-id <N>` flag (line 96) - Optional integer; auto-increments from highest `run` field in `manifest.jsonl` when omitted - Zero-padded to 3 digits (`001`, `002`, …) Seeds pool constants (after path canonicalization) - `SEEDS_DIR` → `$SCRIPT_DIR/seeds/` - `POOL_MANIFEST` → `seeds/manifest.jsonl` - `ADMISSION_THRESHOLD` → `6000000000000000000000` (6e21 wei) `--diverse-seeds` mode now has two paths: 1. Pool mode (pool non-empty): random-shuffles the pool and takes up to `POPULATION` candidates — real evolved diversity, not parametric clones 2. Fallback (pool empty): original `seed-gen-cli` parametric variant behavior - Both paths fall back to mutating `--seed` to fill any shortfall Step 5 — End-of-run admission (after the diff step): 1. Scans all `generation_*.jsonl` in `OUTPUT_DIR` for candidates with `fitness ≥ 6e21` 2. Maps `candidate_id` (e.g. `gen2_c005`) back to `.push3` files in `WORK_DIR` (still exists since cleanup fires on EXIT) 3. Deduplicates by SHA-256 content hash against existing pool 4. Names new files `run{RUN_ID}_gen{N}_c{MMM}.push3` 5. Merges with existing pool, sorts by fitness descending, keeps top 100 6. Copies admitted files to `seeds/`, removes evicted evolved files (never hand-written), rewrites `manifest.jsonl` Co-authored-by: openhands <openhands@all-hands.dev> Reviewed-on: https://codeberg.org/johba/harb/pulls/683 Reviewed-by: review_bot <review_bot@noreply.codeberg.org>	2026-03-13 20:45:03 +01:00
johba	3f435f8459	fix: evolution scoring — 3 bugs made all candidates report fitness=0 (#665 ) ## Three bugs in evolve.sh 1. Heredoc stdin conflict — `py_stats()` used `<<PYEOF` heredoc which stole stdin from the pipe, so python never received score values → stats always `min=0 max=0 mean=0` 2. Bash integer overflow — global best comparison used `[ $MAX -gt $GLOBAL_BEST_FITNESS ]` which overflows on uint256 wei values (>9.2e18) → best always tracked as 0 3. candidate_id mismatch — evolve.sh looked up `gen0_c000` but batch-eval produces `candidate_000` (derived from filename) → score lookup always returned default 0 All 3 previous evolution runs (150+ candidates) reported all zeros despite batch-eval correctly scoring them at ~8.26e21 wei. ## Fix - `py_stats`: heredoc → `python3 -c` inline - Global best: bash `[ -gt ]` → `python3` big number comparison - Score lookup: use `basename $CAND_FILE` instead of synthetic CID Co-authored-by: root <root@debian-g-2vcpu-8gb-ams3-01> Reviewed-on: https://codeberg.org/johba/harb/pulls/665 Reviewed-by: review_bot <review_bot@noreply.codeberg.org>	2026-03-13 10:02:24 +01:00
openhands	89a2734bff	fix: address review findings for diverse seed population (#638 ) - evolve.sh: fix fail-in-subshell bug — run seed-gen-cli as a direct command so its exit code is checked by the parent shell and fail() aborts the script correctly; redirect stderr to log file instead of discarding it with 2>/dev/null - seed-generator.ts: reorder enumerateVariants() to put STAKED_THRESHOLDS outermost (192 entries/block) so that selectVariants(6) with stride=192 covers all 6 staked% thresholds; remove false doc claim about "first variant is current seed config"; add comments explaining CI=0n is intentional in all presets - seed-gen-cli.ts: emit a stderr diagnostic when count exceeds the 1152-variant cap so the cap is visible rather than silently producing fewer files than requested - test: strengthen n=6 test to assert all STAKED_THRESHOLDS values are represented in the selected variants Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-13 05:21:05 +00:00
openhands	850131b74f	fix: feat: Push3 evolution — diverse seed population (#638 ) Add seed-generator.ts module and seed-gen-cli.ts CLI that produce parametric Push3 variants for initial population seeding. Variants systematically cover: - Staked% thresholds: 80, 85, 88, 91, 94, 97 - Penalty thresholds: 30, 50, 70, 100 - Bull params: 4 presets (aggressive → mild) - Bear params: 4 presets (standard → very mild) - Tax distributions: exponential (seed), linear, sqrt Total combination space: 6×4×4×4×3 = 1152 variants. selectVariants(n) samples evenly so every axis is represented. evolve.sh gains --diverse-seeds flag: when set, gen_0 is seeded with parametric variants instead of N copies of the same mutated seed. Remaining slots (if population > generated variants) fall back to mutations of the base seed. All generated programs pass transpiler stack validation (33 new tests). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-13 04:48:04 +00:00
openhands	64f1af3041	fix: feat: Push3 evolution — elitism (top N survive unchanged) (#640 ) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-12 22:29:23 +00:00
openhands	26b8876691	fix: feat: revm-based fitness evaluator for evolution at scale (#604 ) Replace per-candidate Anvil+forge-script pipeline with in-process EVM execution using Foundry's native revm backend, achieving 10-100× speedup for evolutionary search at scale. New files: - onchain/test/FitnessEvaluator.t.sol — Forge test that forks Base once, deploys the full KRAIKEN stack, then for each candidate uses vm.etch to inject the compiled optimizer bytecode, UUPS-upgrades the proxy, runs all attack sequences with in-memory vm.snapshot/revertTo (no RPC overhead), and emits one {"candidate_id","fitness"} JSON line per candidate. Skips gracefully when BASE_RPC_URL is unset (CI-safe). - tools/push3-evolution/revm-evaluator/batch-eval.sh — Wrapper that transpiles+compiles each candidate sequentially, writes a two-file manifest (ids.txt + bytecodes.txt), then invokes FitnessEvaluator.t.sol in a single forge test run and parses the score JSON from stdout. Modified: - tools/push3-evolution/evolve.sh — Adds EVAL_MODE env var (anvil\|revm). When EVAL_MODE=revm, batch-scores every candidate in a generation with one batch-eval.sh call instead of N sequential fitness.sh processes; scores are looked up from the JSONL output in the per-candidate loop. Default remains EVAL_MODE=anvil for backward compatibility. Key design decisions: - Per-candidate Solidity compilation is unavoidable (each Push3 candidate produces different Solidity); the speedup is in the evaluation phase. - vm.snapshot/revertTo in forge test are O(1) memory operations (true revm), not RPC calls — this is the core speedup vs Anvil. - recenterAccess is set in bootstrap so TWAP stability checks are bypassed during attack sequences (mirrors the existing fitness.sh bootstrap). - Test skips cleanly when BASE_RPC_URL is absent, keeping CI green. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-12 11:54:41 +00:00
openhands	ade7e2033a	fix: Evolution pipeline UUPS upgrade + Foundry PATH (#593 ) - Add virtual to Optimizer.calculateParams() for UUPS override - Create OptimizerV3.sol: UUPS-upgradeable optimizer with transpiled Push3 logic - Update deploy-optimizer.sh to deploy OptimizerV3 instead of Optimizer - Add ~/.foundry/bin to PATH in evolve.sh, fitness.sh, deploy-optimizer.sh	2026-03-12 06:47:35 +00:00
openhands	0496c94681	fix: address review findings in evolve.sh (#546 ) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-11 22:06:18 +00:00
openhands	2ee7feb621	fix: address review findings in evolve.sh (#546 ) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-11 21:29:14 +00:00
openhands	547e8beae8	fix: Push3 evolution: selection loop orchestrator (#546 ) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-11 20:56:19 +00:00

24 commits