johba/harb - Forgejo: Beyond coding. We forge.

johba/harb

Author	SHA1	Message	Date
johba	ff86b3691d	chore: extract shared inject.sh, add red-team-sweep.sh (#806 ) ## What - `tools/push3-transpiler/inject.sh` — shared transpile+inject logic used by both batch-eval and red-team-sweep - `batch-eval.sh` — replaced inline 60-line Python block with `inject.sh` call - `scripts/harb-evaluator/red-team-sweep.sh` — red-teams each kindergarten seed using existing `red-team.sh`, with random smoke test gate ## Why Sweep script kept breaking because I rewrote the injection logic instead of reusing batch-eval's proven Python. Now there's one copy. ## Testing - inject.sh tested manually on DO box with optimizer_v3 seed - Smoke test picks random seed, injects + compiles before starting sweep Co-authored-by: openhands <openhands@all-hands.dev> Reviewed-on: https://codeberg.org/johba/harb/pulls/806 Reviewed-by: review_bot <review_bot@noreply.codeberg.org>	2026-03-15 10:24:03 +01:00
openhands	17c904aaa3	fix: batch-eval.sh MANIFEST_DIR (mktemp -d) has no cleanup trap (#763 )	2026-03-14 19:46:50 +00:00
openhands	958b8cfaa0	fix: batch-eval.sh header comment claims wrong candidate_id format (#668 )	2026-03-14 03:36:43 +00:00
openhands	5d369cfab6	fix: address review findings for gas-limit fitness pressure (#637 ) - Optimizer.sol: move CALCULATE_PARAMS_GAS_LIMIT constant to top of contract (after error declaration) to avoid mid-contract placement. Expand natspec with EIP-150 63/64 note: callers need ~203 175 gas to deliver the full 200 000 budget to the inner staticcall. - Optimizer.sol: add ret.length < 128 guard before abi.decode in getLiquidityParams(). Malformed return data (truncated / wrong ABI) from an evolved program now falls back to _bearDefaults() instead of propagating an unhandled revert. The 128-byte minimum is the ABI encoding of (uint256, uint256, uint24, uint256) — four 32-byte slots. - Optimizer.sol: add cross-reference comment to _bearDefaults() noting that its values must stay in sync with LiquidityManager.recenter()'s catch block to prevent silent divergence. - FitnessEvaluator.t.sol: add CALCULATE_PARAMS_GAS_LIMIT mirror constant (must match Optimizer.sol). Disqualify candidates whose measured gas exceeds the production cap with fitness=0 and error="gas_over_limit" — prevents the pipeline from selecting programs that are functionally dead on-chain (would always produce bear defaults in production). - batch-eval.sh: update output format comment to document the gas_used field and over-gas-limit error object added by this feature. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-13 01:05:37 +00:00
openhands	87bb5859e2	fix: revm evaluator — UUPS bypass, deployedBytecode, graceful attack ops - Skip UUPS upgradeTo: etch + vm.store ERC1967 implementation slot directly (OptimizerV3Push3 is standalone, no UUPS inheritance needed for evolution) - Use deployedBytecode (runtime) instead of bytecode (creation) for vm.etch - Inject transpiled body into OptimizerV3.sol (has getLiquidityParams via Optimizer) instead of using standalone OptimizerV3Push3.sol - Wrap buy/sell/stake/unstake in try/catch — attack ops should not abort the batch - Add /tmp read to fs_permissions for batch-eval manifest files - Bootstrap recenter returns bool instead of reverting (soft-fail per candidate)	2026-03-12 19:54:58 +00:00
openhands	26b8876691	fix: feat: revm-based fitness evaluator for evolution at scale (#604 ) Replace per-candidate Anvil+forge-script pipeline with in-process EVM execution using Foundry's native revm backend, achieving 10-100× speedup for evolutionary search at scale. New files: - onchain/test/FitnessEvaluator.t.sol — Forge test that forks Base once, deploys the full KRAIKEN stack, then for each candidate uses vm.etch to inject the compiled optimizer bytecode, UUPS-upgrades the proxy, runs all attack sequences with in-memory vm.snapshot/revertTo (no RPC overhead), and emits one {"candidate_id","fitness"} JSON line per candidate. Skips gracefully when BASE_RPC_URL is unset (CI-safe). - tools/push3-evolution/revm-evaluator/batch-eval.sh — Wrapper that transpiles+compiles each candidate sequentially, writes a two-file manifest (ids.txt + bytecodes.txt), then invokes FitnessEvaluator.t.sol in a single forge test run and parses the score JSON from stdout. Modified: - tools/push3-evolution/evolve.sh — Adds EVAL_MODE env var (anvil\|revm). When EVAL_MODE=revm, batch-scores every candidate in a generation with one batch-eval.sh call instead of N sequential fitness.sh processes; scores are looked up from the JSONL output in the per-candidate loop. Default remains EVAL_MODE=anvil for backward compatibility. Key design decisions: - Per-candidate Solidity compilation is unavoidable (each Push3 candidate produces different Solidity); the speedup is in the evaluation phase. - vm.snapshot/revertTo in forge test are O(1) memory operations (true revm), not RPC calls — this is the core speedup vs Anvil. - recenterAccess is set in bootstrap so TWAP stability checks are bypassed during attack sequences (mirrors the existing fitness.sh bootstrap). - Test skips cleanly when BASE_RPC_URL is absent, keeping CI green. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-12 11:54:41 +00:00

6 commits