# Evidence Directory Machine-readable process results for the KRAIKEN optimizer pipeline. All formulas (evolution, red-team, holdout, user-test) write structured JSON here. ## Purpose - **Planner input** — the planner reads these files to decide next actions (e.g. "last red-team showed IL vulnerability → trigger evolution"). - **Diffable history** — `git log evidence/` shows how metrics change over time. - **Permanent record** — separate from `tmp/` which is ephemeral. ## Directory Layout ``` evidence/ evolution/ YYYY-MM-DD.json # run params, generation stats, best fitness, champion file red-team/ YYYY-MM-DD.json # per-attack results, floor held/broken, ETH extracted holdout/ YYYY-MM-DD-prNNN.json # per-scenario pass/fail, gate decision user-test/ YYYY-MM-DD.json # per-persona reports, screenshot refs, friction points resources/ YYYY-MM-DD.json # disk, RAM, API call counts, budget burn, CI queue depth protocol/ YYYY-MM-DD.json # TVL, accumulated fees, position count, rebalance frequency ``` ## Delivery Pattern Every formula follows the same three-step pattern: 1. **Evidence file** → committed to `evidence/` on main 2. **Git artifacts** (new code, attack vectors, evolved programs) → PR 3. **Human summary** → issue comment with key metrics + link to evidence file --- ## Fee-Income Calculation Model This section documents how `delta_bps` values in red-team and holdout evidence files are derived, so that recorded values can be independently verified. ### Measurement tool `delta_bps` is computed from two snapshots of **LM total ETH** taken by [`onchain/script/LmTotalEth.s.sol`](../onchain/script/LmTotalEth.s.sol): ``` lm_total_eth = lm.balance (free ETH) + WETH.balanceOf(lm) (free WETH) + Σ positionEthPrincipal(stage) for stage ∈ {FLOOR, ANCHOR, DISCOVERY} ``` Each position's ETH principal is calculated via `LiquidityAmounts.getAmountsForLiquidity` at the pool's current `sqrtPriceX96`. Only the WETH side of each position is summed; the KRK side is excluded. ### What is and is not counted | Counted | Not counted | |---------|-------------| | Free native ETH on the LM contract | KRK balance (free or in positions) | | Free WETH (ERC-20) on the LM contract | Uncollected fees still inside Uni V3 positions | | ETH-side principal of all 3 positions | KRK fees transferred to `feeDestination` | **Key consequence:** Uncollected fees accrued inside Uniswap V3 positions are invisible to `LmTotalEth` until a `recenter()` call executes `pool.burn` + `pool.collect`, which converts them into free WETH on the LM contract (or transfers them to `feeDestination`). A `recenter()` between the two snapshots materializes these fees into the measurement. ### `delta_bps` formula ``` delta_bps = (lm_eth_after − lm_eth_before) / lm_eth_before × 10_000 ``` Where `lm_eth_before` and `lm_eth_after` are `LmTotalEth` readings taken before and after the attack sequence. Each attack is snapshot-isolated (Anvil snapshot → execute → measure → revert), so per-attack `delta_bps` values are independent. ### Components that drive `delta_bps` A round-trip trade (buy KRK with ETH, then sell KRK back for ETH) through the LM's dominant positions produces a positive `delta_bps` from three sources: 1. **Pool fee income (1% per leg).** The WETH/KRK pool charges a 1% fee (`FEE = 10_000` in `LiquidityManager.sol`). On a simple round trip this contributes ~2% of volume. However, fees accrue as uncollected position fees and only become visible after `recenter()` materializes them. If no recenter occurs between snapshots, fee income is partially hidden (reflected only indirectly through reduced trade output). 2. **Concentrated-liquidity slippage.** The LM's three-position strategy concentrates most liquidity in narrow tick ranges. Trades that exceed the depth of a position range push through progressively thinner liquidity, causing super-linear slippage. The attacker receives fewer tokens per unit of input on each marginal unit. This slippage transfers value to the LM's positions as increased ETH principal. 3. **Recenter repositioning gain.** When `recenter()` is called between trade legs: - All three positions are burned and fees collected. - New positions are minted at the current price. - Any accumulated fees (WETH portion) become free WETH and are redeployed as new position liquidity. KRK fees are sent to `feeDestination`. - The repositioned liquidity changes the tick ranges the next trade interacts with. ### Why `delta_bps` is non-linear A naive estimate of `delta_bps ≈ volume × 1% × 2 legs / lm_eth_before × 10_000` underestimates the actual value for large trades because: - **Slippage dominates at high volume.** When trade volume approaches or exceeds the ETH depth of the active positions, the price moves through the entire concentrated range and into thin or empty ticks. The slippage loss to the attacker (= gain to the LM) grows super-linearly with volume. - **Multi-recenter compounding.** Strategies that call `recenter()` between sub-trades materialize intermediate fees and reposition liquidity at a new price. Subsequent trades pay fees at the new tick ranges, compounding the total fee capture. - **KRK fee exclusion.** KRK fees collected during `recenter()` are transferred to `feeDestination` and excluded from `LmTotalEth`. This means the measurement captures the ETH-side gain but not the KRK-side gain — `delta_bps` understates total protocol revenue. ### Fee destination behaviour When `feeDestination` is `address(0)` or `address(this)` (the LM contract itself), fees are **not** transferred out — they remain as deployable liquidity on the LM. In this configuration, materialized WETH fees increase `lm_total_eth` directly. When `feeDestination` is an external address, WETH fees are transferred out and do **not** contribute to `lm_total_eth`. The red-team test environment uses `feeDestination = address(this)` so that fee income is fully reflected in `delta_bps`. ### Worked example Using `attacks[1]` from `evidence/red-team/2026-03-20.json`: > **"Buy → Recenter → Sell (800 ETH round trip)"** — `delta_bps: 1179` **Given:** - `lm_eth_before` = 999,999,999,999,999,999,998 wei ≈ 1000 ETH - Trade volume = 800 ETH (buy leg) + equivalent KRK sell leg - Pool fee rate = 1% per swap - `feeDestination = address(this)` (fees stay in LM) **Step-by-step derivation:** 1. **Buy leg (800 ETH → KRK):** The 800 ETH buy pushes the price ~4000 ticks into the concentrated positions. The pool charges 1% (≈8 ETH in fees accruing to positions). Because liquidity is concentrated, the price moves far — the attacker receives significantly fewer KRK than a constant-product AMM would give. After the buy, position ETH principal increases (price moved up = more ETH value in range). 2. **Recenter:** Positions are burned, collecting all accrued fees. New positions are minted at the new (higher) price. The ~8 ETH in WETH fees plus the ETH-side principal become redeployable liquidity. 3. **Sell leg (KRK → ETH):** The attacker sells all acquired KRK back through the newly positioned liquidity. Another 1% fee applies. Because the attacker received fewer KRK than 800 ETH worth (due to buy-leg slippage), the sell leg returns significantly less than 800 ETH. The price drops back but the LM retains the slippage differential. 4. **Result:** `lm_eth_after ≈ 1000 + 117.9 ≈ 1117.9 ETH`. ``` delta_bps = (1117.9 − 1000) / 1000 × 10_000 = 1179 bps ``` The ~117.9 ETH gain comes from: 1% fees on both legs (~16 ETH) **plus** ~102 ETH in concentrated-liquidity slippage loss by the attacker. The slippage component dominates because 800 ETH far exceeds the depth of the anchor/discovery positions, pushing the trade through increasingly thin liquidity. **Cross-check — why naive formula fails:** ``` naive = 800 × 0.01 × 2 / 1000 × 10_000 = 160 bps (actual: 1179 bps) ``` The naive estimate assumes uniform liquidity (constant slippage = fee rate only). The 7× difference is entirely due to concentrated-liquidity slippage on a trade that exceeds position depth. --- ## Schema: `evolution/YYYY-MM-DD.json` Records one optimizer evolution run. ```json { "date": "YYYY-MM-DD", "run_params": { "generations": 50, "population_size": 20, "seed": 42, "base_optimizer": "OptimizerV3" }, "generation_stats": [ { "generation": 1, "best_fitness": -12.4, "mean_fitness": -34.1, "worst_fitness": -91.2 } ], "best_fitness": -8.7, "champion_file": "onchain/src/OptimizerV4.sol", "champion_commit": "abc1234", "verdict": "improved" | "no_improvement" } ``` | Field | Type | Description | |-------|------|-------------| | `date` | string (ISO) | Date of the run | | `run_params` | object | Input parameters used | | `generation_stats` | array | Per-generation fitness summary | | `best_fitness` | number | Best fitness score achieved (lower = better loss for LM) | | `champion_file` | string | Repo-relative path to winning optimizer | | `champion_commit` | string | Git commit SHA of the champion (if promoted) | | `verdict` | string | `"improved"` or `"no_improvement"` | --- ## Schema: `red-team/YYYY-MM-DD.json` Records one adversarial red-team run against a candidate optimizer. ```json { "date": "YYYY-MM-DD", "candidate": "OptimizerV3", "candidate_commit": "abc1234", "optimizer_profile": "push3-default", "lm_eth_before": 1000000000000000000000, "lm_eth_after": 998500000000000000000, "eth_extracted": 1500000000000000000, "floor_held": false, "verdict": "floor_broken" | "floor_held", "attacks": [ { "strategy": "Flash buy + stake + recenter loop", "pattern": "wrap → buy → stake → recenter_multi → sell", "result": "DECREASED" | "HELD" | "INCREASED", "delta_bps": -150, "insight": "Rapid recenters pack ETH into floor while ratcheting it toward current price" } ] } ``` | Field | Type | Description | |-------|------|-------------| | `date` | string (ISO) | Date of the run | | `candidate` | string | Optimizer under test | | `candidate_commit` | string | Git commit SHA of the optimizer under test | | `optimizer_profile` | string | Named profile / push3 variant | | `lm_eth_before` | integer (wei) | LM total ETH at start | | `lm_eth_after` | integer (wei) | LM total ETH at end | | `eth_extracted` | integer (wei) | `lm_eth_before - lm_eth_after` (0 if floor held) | | `floor_held` | boolean | `true` if no ETH was extracted | | `verdict` | string | `"floor_held"` or `"floor_broken"` | | `attacks[].strategy` | string | Human-readable strategy name | | `attacks[].pattern` | string | Abstract op sequence (e.g. `wrap → buy → stake`) | | `attacks[].result` | string | `"DECREASED"`, `"HELD"`, or `"INCREASED"` | | `attacks[].delta_bps` | integer | LM ETH change in basis points | | `attacks[].insight` | string | Key finding from this strategy | ### Snapshot-Isolation Methodology All red-team runs use **snapshot isolation** as the standard methodology. This ensures that each attack is evaluated independently against the same initial state, rather than against a cumulative balance modified by prior attacks. **How it works:** 1. Before the first attack, the test runner records the initial `lm_eth_before` value and takes an Anvil snapshot via the `anvil_snapshot` RPC method. 2. Each attack executes against this snapshot: run the attack, measure `lm_eth_after`, compute `delta_bps`, then revert to the snapshot via the `anvil_revert` RPC method. 3. The next attack begins from the exact same chain state as the previous one. **Field semantics under snapshot isolation:** | Field | Semantics | |-------|-----------| | `lm_eth_before` | LM total ETH at the shared initial snapshot — identical for every attack in the run | | `lm_eth_after` | LM total ETH measured after this specific attack, before reverting | | `attacks[].delta_bps` | Change relative to the shared `lm_eth_before`, not relative to any prior attack | **Key implications:** - `lm_eth_before` and `lm_eth_after` reflect **per-attack state**, not cumulative historical balance. Each attack sees the same starting ETH. - Attack results are independent and order-insensitive — reordering attacks does not change any individual `delta_bps` value. --- ## Schema: `holdout/YYYY-MM-DD-prNNN.json` Records a holdout quality gate evaluation for a specific PR. ```json { "date": "YYYY-MM-DD", "pr": 123, "candidate_commit": "abc1234", "scenarios": [ { "name": "bear_market_crash", "passed": true, "lm_eth_delta_bps": 12, "notes": "" }, { "name": "flash_buy_exploit", "passed": false, "lm_eth_delta_bps": -340, "notes": "Floor broken on 2000-trade run" } ], "scenarios_passed": 4, "scenarios_total": 5, "gate_passed": false, "verdict": "pass" | "fail", "blocking_scenarios": ["flash_buy_exploit"] } ``` | Field | Type | Description | |-------|------|-------------| | `date` | string (ISO) | Date of evaluation | | `pr` | integer | PR number being evaluated | | `candidate_commit` | string | Commit SHA under test | | `scenarios` | array | One entry per holdout scenario | | `scenarios[].name` | string | Scenario identifier | | `scenarios[].passed` | boolean | Whether LM ETH held or improved | | `scenarios[].lm_eth_delta_bps` | integer | LM ETH change in basis points | | `scenarios[].notes` | string | Free-text notes on failure mode | | `scenarios_passed` | integer | Count of passing scenarios | | `scenarios_total` | integer | Total scenarios run | | `gate_passed` | boolean | `true` if all required scenarios passed | | `verdict` | string | `"pass"` or `"fail"` | | `blocking_scenarios` | array of strings | Scenario names that caused failure | --- ## Schema: `user-test/YYYY-MM-DD.json` Records a UX evaluation run across simulated personas. ```json { "date": "YYYY-MM-DD", "personas": [ { "name": "crypto_native", "task": "stake_and_set_tax_rate", "completed": true, "friction_points": [], "screenshot_refs": ["tmp/screenshots/crypto_native_stake.png"], "notes": "" }, { "name": "defi_newcomer", "task": "first_buy_and_stake", "completed": false, "friction_points": ["Tax rate slider label unclear", "No confirmation of stake tx"], "screenshot_refs": ["tmp/screenshots/defi_newcomer_confused.png"], "notes": "User abandoned at tax rate step" } ], "personas_completed": 1, "personas_total": 2, "critical_friction_points": ["Tax rate slider label unclear"], "verdict": "pass" | "fail" } ``` | Field | Type | Description | |-------|------|-------------| | `date` | string (ISO) | Date of evaluation | | `personas` | array | One entry per simulated persona | | `personas[].name` | string | Persona identifier | | `personas[].task` | string | Task the persona attempted | | `personas[].completed` | boolean | Whether the task was completed | | `personas[].friction_points` | array of strings | UX issues encountered | | `personas[].screenshot_refs` | array of strings | Repo-relative paths to screenshots | | `personas[].notes` | string | Free-text observations | | `personas_completed` | integer | Count of personas who completed their task | | `personas_total` | integer | Total personas evaluated | | `critical_friction_points` | array of strings | Friction points that blocked task completion | | `verdict` | string | `"pass"` if all personas completed, `"fail"` otherwise | --- ## Schema: `resources/YYYY-MM-DD.json` Records one infrastructure resource snapshot. ```json { "date": "YYYY-MM-DD", "disk": { "used_bytes": 85899345920, "total_bytes": 107374182400, "used_pct": 80.0 }, "ram": { "used_bytes": 3221225472, "total_bytes": 8589934592, "used_pct": 37.5 }, "api": { "anthropic_calls_24h": 142, "anthropic_budget_usd_used": 4.87, "anthropic_budget_usd_limit": 50.0, "anthropic_budget_pct": 9.7 }, "ci": { "woodpecker_queue_depth": 2, "woodpecker_running": 1 }, "staleness_threshold_days": 1, "verdict": "ok" | "warn" | "critical" } ``` | Field | Type | Description | |-------|------|-------------| | `date` | string (ISO) | Date of the snapshot | | `disk.used_bytes` | integer | Bytes used on the primary volume | | `disk.total_bytes` | integer | Total bytes on the primary volume | | `disk.used_pct` | number | Percentage of disk used | | `ram.used_bytes` | integer | Bytes of RAM in use | | `ram.total_bytes` | integer | Total bytes of RAM | | `ram.used_pct` | number | Percentage of RAM used | | `api.anthropic_calls_24h` | integer | Anthropic API calls in the past 24 hours | | `api.anthropic_budget_usd_used` | number | USD spent against the Anthropic budget | | `api.anthropic_budget_usd_limit` | number | Configured Anthropic budget ceiling in USD | | `api.anthropic_budget_pct` | number | Percentage of budget consumed | | `ci.woodpecker_queue_depth` | integer | Number of jobs waiting in the Woodpecker CI queue | | `ci.woodpecker_running` | integer | Number of Woodpecker jobs currently running | | `staleness_threshold_days` | integer | Maximum age in days before this record is considered stale (always 1) | | `verdict` | string | `"ok"` (all metrics normal), `"warn"` (≥80% on any dimension), or `"critical"` (≥95% on any dimension) | --- ## Schema: `protocol/YYYY-MM-DD.json` Records one on-chain protocol health snapshot. ```json { "date": "YYYY-MM-DD", "block_number": 24500000, "tvl_eth": "1234567890000000000000", "tvl_eth_formatted": "1234.57", "accumulated_fees_eth": "12345678900000000", "accumulated_fees_eth_formatted": "0.012", "position_count": 3, "positions": [ { "name": "floor", "tick_lower": -887272, "tick_upper": -200000, "liquidity": "987654321000000000" }, { "name": "anchor", "tick_lower": -200000, "tick_upper": 0 }, { "name": "discovery", "tick_lower": 0, "tick_upper": 887272 } ], "rebalance_count_24h": 4, "last_rebalance_block": 24499800, "staleness_threshold_days": 1, "verdict": "healthy" | "degraded" | "offline" } ``` | Field | Type | Description | |-------|------|-------------| | `date` | string (ISO) | Date of the snapshot | | `block_number` | integer | Block number at time of snapshot | | `tvl_eth` | string (wei) | Total value locked across all LM positions in wei | | `tvl_eth_formatted` | string | TVL formatted in ETH (2 dp) | | `accumulated_fees_eth` | string (wei) | Fees accumulated by the LiquidityManager in wei | | `accumulated_fees_eth_formatted` | string | Fees formatted in ETH (3 dp) | | `position_count` | integer | Number of active Uniswap V3 positions (expected: 3) | | `positions` | array | One entry per active position | | `positions[].name` | string | Position label: `"floor"`, `"anchor"`, or `"discovery"` | | `positions[].tick_lower` | integer | Lower tick boundary | | `positions[].tick_upper` | integer | Upper tick boundary | | `positions[].liquidity` | string | Liquidity amount in the position (wei-scale integer) | | `rebalance_count_24h` | integer | Number of `recenter()` calls in the past 24 hours | | `last_rebalance_block` | integer | Block number of the most recent `recenter()` call | | `staleness_threshold_days` | integer | Maximum age in days before this record is considered stale (always 1) | | `verdict` | string | `"healthy"` (positions active, TVL > 0), `"degraded"` (position_count < 3 or rebalance stalled), or `"offline"` (TVL = 0 or contract unreachable) |