- #989: Quote $VARIANT_IDX and $NEXT_IDX in printf '%03d' calls in
evolve.sh (SC2086 — no behavior change, style consistency)
- #612: Already resolved by commit 79a2e2e (fitness.sh switched from
deployments-local.json to broadcast JSON, eliminating dead Kraiken/Stake reads)
- #945: Already resolved by commit 052ad7a (manifest.schema.json
fitness_flags description corrected to "Comma-separated")
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Renumber test_transpiler_clamping.sh tests from 5-14 to 6-15 to avoid
overlap with test_inject_extraction.sh Test 5 (#1017).
Items #1012 (ts-node→tsx) and #986 (CI using npm test) were already
resolved by prior commits.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Review feedback: d.get('fitness_flags') without a default preserves the
null vs absent distinction mandated by the manifest schema (string | null).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two changes in evolve.sh pool-admission code:
1. Include `fitness_flags` from evaluator JSONL in the manifest entry dict
for newly admitted candidates (~line 866-874). Previously the field was
omitted, so downstream `effective_fitness()` could never zero-rate a new
candidate.
2. Use `effective_fitness(entry)` when appending new candidates to the
evolved ranking list (~line 907), so ZERO_RATED_FLAGS defence applies
at first admission — not only when re-ranking existing entries.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When `git apply --check` passes but `git apply` itself fails, the code
now checks STOP_REQUESTED before continuing to the next iteration,
consistent with the check at the end of the main loop.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Widen rootDir from "." to ".." and include push3-transpiler sources so
tsc can resolve the ../push3-transpiler/src imports that mutate.ts and
test files use.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add onchain/deployments-local.json to .gitignore so it is no longer tracked
- Remove the stale committed file from git
- Update fitness.sh to read LM address from forge broadcast JSON
(DeployLocal.sol's run-latest.json) instead of the potentially stale
deployments-local.json, matching the approach deploy-optimizer.sh already uses
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes#618
## Changes
Add stack depth validation in processExecIf() so asymmetric EXEC.IF branches (where one branch pushes more values than the other) throw an explicit error instead of silently padding with '0'. Error messages identify both branch depths for DYADIC and BOOLEAN stacks. Removed dead-code '0'/'false' fallbacks in buildAssignments and reconstruction. Updated existing unbalanced-branch tests to expect errors; added regression tests for error message content and BOOLEAN mismatch. All existing seed files (optimizer_v3.push3, optimizer_seed.push3) continue to transpile.
Co-authored-by: openhands <openhands@all-hands.dev>
Reviewed-on: https://codeberg.org/johba/harb/pulls/1033
Reviewed-by: Disinto_bot <disinto_bot@noreply.codeberg.org>
Add TypeScript unit test suite for the Push3 transpiler using Node's
built-in test runner (node:test) with tsx. 47 tests across 12 suites
covering parser, stack underflow/overflow, EXEC.IF balanced/unbalanced/
nested branching, arithmetic, boolean ops, name binding, and integration.
Update CI to run `npm test` (which now includes unit tests + existing
bash tests) and scope transpiler-tests step to only trigger on changes
to tools/push3-transpiler/**.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
batch-eval.sh mutates OptimizerV3.sol by injecting Push3 candidates but
never restores it on exit. Add a backup/restore trap so the file is
always returned to its committed state, and add a CI step that fails
loudly if OptimizerV3.sol is left dirty after any pipeline step.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Missed reference — deploy-optimizer.sh still called npx ts-node,
which would fail now that ts-node is removed from devDependencies.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
Bundled dust cleanup for `push3-evolution/evolve.sh` subsystem:
- **#716**: Fix null-fitness crash in generation JSONL parsing — `int(d.get('fitness', 0))` → `int(d.get('fitness') or 0)` (avoids `TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'` when fitness is JSON `null`)
- **#944**: Add `processExecIf_fix` to `ZERO_RATED_FLAGS` so inflated scores from that flag are zero-rated during pool admission/eviction
- **#945**: `fitness_flags` is comma-separated in practice — update `manifest.schema.json` description from 'Space-separated' to 'Comma-separated' and use `flags.split(',')` in `effective_fitness` instead of substring match
- Fix pre-existing SC2086: quote `$i` in `printf` argument (ShellCheck)
## Test plan
- [ ] ShellCheck passes on `tools/push3-evolution/evolve.sh`
- [ ] CI passes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: openhands <openhands@all-hands.dev>
Reviewed-on: https://codeberg.org/johba/harb/pulls/987
Reviewed-by: Disinto_bot <disinto_bot@noreply.codeberg.org>
Add assertUint256Max1e18 validator in index.ts and apply it to the ci,
anchorShare, and discoveryDepth output literals. Programs emitting values
> 1e18 for these fields now fail with a clear transpiler-level error instead
of silently violating LiquidityManager invariants at runtime.
Add tests 12-14 in test_transpiler_clamping.sh covering the over-range
rejection for each of the three fields.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
AGENTS.md principle #1/#3 forbids fixed delays. When evolution.patch fails
the pre-flight --check, exit 1 lets the process supervisor handle restart
timing instead of a hardcoded sleep 300 busy-spin.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When git apply --check fails, the daemon now sleeps 300s before retrying,
preventing a tight busy loop that would hammer the git remote indefinitely.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add llm_balanced.push3: arithmetic-only optimizer that keeps all
outputs in a balanced mid-range. anchorShare=40-60% (linear with
percentageStaked), anchorWidth=10-200 ticks (linear with taxRate),
discoveryDepth=30-50% (linear with percentageStaked), ci=0. No
EXEC.IF branches — all transitions via multiplication and division.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The 30-way threshold lookup in optimizer_seed.push3 generates enough
local variables to trigger "Stack too deep" without IR compilation.
Add via_ir = true to the minimal foundry.toml created in both test
scripts, matching the setting in onchain/foundry.toml.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- test_transpiler_clamping.sh: add Test 11 that runs forge build on the
valid Solidity output from Test 6; fails if the transpiled contract
does not compile (regression guard for #900)
- test_inject_extraction.sh: add SCRIPT_DIR, then Test 5 that transpiles
optimizer_seed.push3 and runs forge build on the generated contract;
ensures the full push3→Solidity→compile pipeline stays green
- .woodpecker/ci.yml: add transpiler-tests step that installs npm deps
and runs both test scripts with forge on PATH
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Clamp anchorWidth output with `% (2**24)` before the uint24 cast so that
large literal values (e.g. 1e18 from evolved constants) produce valid
Solidity instead of a compile-time overflow error.
Add test_transpiler_clamping.sh (Test 5) verifying that a Push3 program
outputting 1e18 for anchorWidth generates `uint24(... % (2**24))` and not
the raw overflowing literal. Update package.json to run both test suites.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add skip_candidate() helper that emits fitness=0 JSON to stdout and
tracks the failed score for the output-dir file, satisfying the
downstream scorer's expectation of one JSON line per candidate.
- Unify all failure paths (transpile, forge build, bytecode extract,
empty bytecode) through skip_candidate() with a distinct error key.
- Log message now reads "WARNING: <id> compile failed — scoring as 0"
as required by the acceptance criteria.
- Output-dir scores.jsonl now merges successful + failed scores so the
file is complete even when some candidates fail to compile.
- All-candidates-fail path (COMPILED_COUNT=0) still exits 2 (no viable
population); true infra errors (missing tool, bad RPC) unchanged.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Move CROSS_PATTERNS_FILE from /tmp/red-team-cross-patterns.jsonl to
tools/red-team/cross-patterns.jsonl (repo-tracked path)
- Remove the reset (> file) at sweep start so patterns accumulate across runs
- Generate a SWEEP_ID (sweep-YYYYMMDD-HHMMSS) at sweep start and stamp
each new entry with sweep_id for traceability
- Deduplicate on (pattern, candidate, result): entries already present in
the file are skipped; intra-batch duplicates are also suppressed
- Create tools/red-team/ directory with .gitkeep
- Add mkdir -p guards in both scripts so the directory is created on first run
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extend the patch to also replace the NatSpec comments above MAX_ANCHOR_WIDTH,
which became misleading after switching to type(uint24).max. The old comments
claimed overflow-safety ("fits in int24"); the new comments document that the
production cap is 1233, that values above 123358 overflow int24 and revert,
and that this is tolerable in the evolution context where reverts score zero
fitness. The patch now correctly updates both the constant and its documentation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Regenerate evolution.patch from the current ThreePositionStrategy.sol.
The old patch had a corrupt hunk header (@@ -33,7 +33,7 @@ claiming 7 lines
but only supplying 4) and placeholder index hashes (0000000..0000000),
causing `git apply` to reject it with "corrupt patch". MAX_ANCHOR_WIDTH
still exists in the file at value 1233; the patch correctly overrides it
to type(uint24).max for unbounded evolution runs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Update tsconfig.json to use NodeNext module system (fixes CJS/ESM conflict),
enable ts-node ESM mode, and add .js extensions to relative imports so the
built output and ts-node dev script both work correctly with "type":"module".
Replace the }` heuristic in inject.sh with a brace-depth counter:
start at depth=1 after the opening {, increment on {, decrement on },
stop when depth reaches 0. This correctly handles nested if/else blocks,
loops, and structs that close at 4-space indent inside calculateParams.
Also emit a non-zero exit with a descriptive message if EOF is reached
without finding the matching closing brace.
Add test_inject_extraction.sh covering simple bodies, nested if/else,
multi-level nesting, and the EOF-without-match error case.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dpop/bpop silently returned '0'/'false' on stack underflow instead of
throwing, so isValid() never returned false for underflowing programs.
Make dpop and bpop throw an Error on underflow so the transpiler's
existing try/catch in isValid() correctly classifies such programs as
invalid. The output-extraction phase uses state.dStack.pop() directly
(not dpop) and is unaffected.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Null out the stale fitness score (7116531284966772550194) for
evo_run007_champion.push3, which was recorded against the buggy
processExecIf interpreter (pre-#655 fix). Setting fitness to null
marks the entry for re-scoring by evaluate-seeds.sh once a valid
ANVIL_FORK_URL is available. Updated the note field to document why
the fitness was cleared.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
## What
- `tools/push3-transpiler/inject.sh` — shared transpile+inject logic used by both batch-eval and red-team-sweep
- `batch-eval.sh` — replaced inline 60-line Python block with `inject.sh` call
- `scripts/harb-evaluator/red-team-sweep.sh` — red-teams each kindergarten seed using existing `red-team.sh`, with random smoke test gate
## Why
Sweep script kept breaking because I rewrote the injection logic instead of reusing batch-eval's proven Python. Now there's one copy.
## Testing
- inject.sh tested manually on DO box with optimizer_v3 seed
- Smoke test picks random seed, injects + compiles before starting sweep
Co-authored-by: openhands <openhands@all-hands.dev>
Reviewed-on: https://codeberg.org/johba/harb/pulls/806
Reviewed-by: review_bot <review_bot@noreply.codeberg.org>
- Change WARNING to explicitly state "legacy CID format ... migration not supported, skipping"
- Expand comment near the startswith('candidate_') guard to document the CID format
contract and explain why re-admission is intentionally out of scope (no surviving
generation_N.jsonl files from runs 1-6 exist in the repo)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Pass seed basename into the admission Python block as argv[7]
- Add \`note\` field to every new evolved entry: "Evolved from <seed> (run<N> gen<G>)"
- Add migration comment noting entries admitted before this fix may have note: null
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace unquoted heredoc (shell-injection path) with a temp file: the
shell loop now appends tab-separated filename/score lines to a temp
file, which is passed as a plain path argument to the Python manifest-
rewrite block. Python reads only file contents, never executes shell-
expanded strings.
- Add early abort on fitness.sh exit code 2 (infra error: Anvil down,
missing tool). Iterating past an infra failure produces no useful
results; aborting immediately surfaces the real problem.
- Remove unused `os` import from the manifest-rewrite Python block.
- Fix inaccurate comment in evolve.sh --diverse-seeds sampling: the pool
sampler does a flat random shuffle with no fitness weighting; null-
fitness seeds are not "treated as 0" — they are sampled with equal
probability to any other seed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add evaluate-seeds.sh: standalone script that reads manifest.jsonl,
finds every entry with fitness: null, runs fitness.sh against each
seed file, and atomically writes results back to manifest.jsonl.
Supports --dry-run to preview without evaluating.
- Add comment to --diverse-seeds sampling in evolve.sh documenting that
null-fitness seeds are included with effective_fitness=0 and that
evaluate-seeds.sh should be run to score them.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Define ZERO_RATED_FLAGS set near effective_fitness and check each flag
with any(...in flags...) instead of a single hard-coded substring test.
token_value_inflation behaviour is preserved; new flags can be added to
the set without touching the dispatch logic.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace AW=250 (VERY AGGRESSIVE) with 100 and AW=150 (AGGRESSIVE) with 80
so neither value is silently clamped by LiquidityManager.MAX_ANCHOR_WIDTH=100.
Update header comment block to match the corrected values.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>