harb/evidence
johba b3adea399f evidence: ponder 504 persistence check 2026-03-28
Test prediction #1185: ponder 504 Gateway Timeout is NOT persistent.
Fresh stack start shows ponder healthy (<50ms, all 200 OK).
Staking still blocked by webapp protocol-stats fetch issue.
0/5 personas completed staking, 5/5 wallet+buy succeeded.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 07:27:27 +00:00
..
evolution fix: Evidence directory structure for process results (#973) 2026-03-19 08:28:04 +00:00
holdout fix: Investigate: adversary parasitic LP extracts 29% from holder, all recenters fail (#517) 2026-03-22 19:45:35 +00:00
protocol fix: feat: implement evidence/resources and evidence/protocol logging (#1059) 2026-03-21 19:39:23 +00:00
red-team evidence: red-team OptimizerV3 full session 2026-03-27 (re-run) 2026-03-27 14:18:13 +00:00
resources fix: feat: implement evidence/resources and evidence/protocol logging (#1059) 2026-03-21 19:39:23 +00:00
user-test evidence: ponder 504 persistence check 2026-03-28 2026-03-28 07:27:27 +00:00
README.md fix: evidence/README.md schema should be updated to include candidate_commit and methodology fields (#1086) 2026-03-24 21:27:44 +00:00

Evidence Directory

Machine-readable process results for the KRAIKEN optimizer pipeline. All formulas (evolution, red-team, holdout, user-test) write structured JSON here.

Purpose

  • Planner input — the planner reads these files to decide next actions (e.g. "last red-team showed IL vulnerability → trigger evolution").
  • Diffable historygit log evidence/ shows how metrics change over time.
  • Permanent record — separate from tmp/ which is ephemeral.

Directory Layout

evidence/
  evolution/
    YYYY-MM-DD.json       # run params, generation stats, best fitness, champion file
  red-team/
    YYYY-MM-DD.json       # per-attack results, floor held/broken, ETH extracted
  holdout/
    YYYY-MM-DD-prNNN.json # per-scenario pass/fail, gate decision
  user-test/
    YYYY-MM-DD.json       # per-persona reports, screenshot refs, friction points
  resources/
    YYYY-MM-DD.json       # disk, RAM, API call counts, budget burn, CI queue depth
  protocol/
    YYYY-MM-DD.json       # TVL, accumulated fees, position count, rebalance frequency

Delivery Pattern

Every formula follows the same three-step pattern:

  1. Evidence file → committed to evidence/ on main
  2. Git artifacts (new code, attack vectors, evolved programs) → PR
  3. Human summary → issue comment with key metrics + link to evidence file

Fee-Income Calculation Model

This section documents how delta_bps values in red-team and holdout evidence files are derived, so that recorded values can be independently verified.

Measurement tool

delta_bps is computed from two snapshots of LM total ETH taken by onchain/script/LmTotalEth.s.sol:

lm_total_eth = lm.balance (free ETH)
             + WETH.balanceOf(lm) (free WETH)
             + Σ positionEthPrincipal(stage) for stage ∈ {FLOOR, ANCHOR, DISCOVERY}

Each position's ETH principal is calculated via LiquidityAmounts.getAmountsForLiquidity at the pool's current sqrtPriceX96. Only the WETH side of each position is summed; the KRK side is excluded.

What is and is not counted

Counted Not counted
Free native ETH on the LM contract KRK balance (free or in positions)
Free WETH (ERC-20) on the LM contract Uncollected fees still inside Uni V3 positions
ETH-side principal of all 3 positions KRK fees transferred to feeDestination

Key consequence: Uncollected fees accrued inside Uniswap V3 positions are invisible to LmTotalEth until a recenter() call executes pool.burn + pool.collect, which converts them into free WETH on the LM contract (or transfers them to feeDestination). A recenter() between the two snapshots materializes these fees into the measurement.

delta_bps formula

delta_bps = (lm_eth_after  lm_eth_before) / lm_eth_before × 10_000

Where lm_eth_before and lm_eth_after are LmTotalEth readings taken before and after the attack sequence. Each attack is snapshot-isolated (Anvil snapshot → execute → measure → revert), so per-attack delta_bps values are independent.

Components that drive delta_bps

A round-trip trade (buy KRK with ETH, then sell KRK back for ETH) through the LM's dominant positions produces a positive delta_bps from three sources:

  1. Pool fee income (1% per leg). The WETH/KRK pool charges a 1% fee (FEE = 10_000 in LiquidityManager.sol). On a simple round trip this contributes ~2% of volume. However, fees accrue as uncollected position fees and only become visible after recenter() materializes them. If no recenter occurs between snapshots, fee income is partially hidden (reflected only indirectly through reduced trade output).

  2. Concentrated-liquidity slippage. The LM's three-position strategy concentrates most liquidity in narrow tick ranges. Trades that exceed the depth of a position range push through progressively thinner liquidity, causing super-linear slippage. The attacker receives fewer tokens per unit of input on each marginal unit. This slippage transfers value to the LM's positions as increased ETH principal.

  3. Recenter repositioning gain. When recenter() is called between trade legs:

    • All three positions are burned and fees collected.
    • New positions are minted at the current price.
    • Any accumulated fees (WETH portion) become free WETH and are redeployed as new position liquidity. KRK fees are sent to feeDestination.
    • The repositioned liquidity changes the tick ranges the next trade interacts with.

Why delta_bps is non-linear

A naive estimate of delta_bps ≈ volume × 1% × 2 legs / lm_eth_before × 10_000 underestimates the actual value for large trades because:

  • Slippage dominates at high volume. When trade volume approaches or exceeds the ETH depth of the active positions, the price moves through the entire concentrated range and into thin or empty ticks. The slippage loss to the attacker (= gain to the LM) grows super-linearly with volume.
  • Multi-recenter compounding. Strategies that call recenter() between sub-trades materialize intermediate fees and reposition liquidity at a new price. Subsequent trades pay fees at the new tick ranges, compounding the total fee capture.
  • KRK fee exclusion. KRK fees collected during recenter() are transferred to feeDestination and excluded from LmTotalEth. This means the measurement captures the ETH-side gain but not the KRK-side gain — delta_bps understates total protocol revenue.

Fee destination behaviour

When feeDestination is address(0) or address(this) (the LM contract itself), fees are not transferred out — they remain as deployable liquidity on the LM. In this configuration, materialized WETH fees increase lm_total_eth directly. When feeDestination is an external address, WETH fees are transferred out and do not contribute to lm_total_eth. The red-team test environment uses feeDestination = address(this) so that fee income is fully reflected in delta_bps.

Worked example

Using attacks[1] from evidence/red-team/2026-03-20.json:

"Buy → Recenter → Sell (800 ETH round trip)"delta_bps: 1179

Given:

  • lm_eth_before = 999,999,999,999,999,999,998 wei ≈ 1000 ETH
  • Trade volume = 800 ETH (buy leg) + equivalent KRK sell leg
  • Pool fee rate = 1% per swap
  • feeDestination = address(this) (fees stay in LM)

Step-by-step derivation:

  1. Buy leg (800 ETH → KRK): The 800 ETH buy pushes the price ~4000 ticks into the concentrated positions. The pool charges 1% (≈8 ETH in fees accruing to positions). Because liquidity is concentrated, the price moves far — the attacker receives significantly fewer KRK than a constant-product AMM would give. After the buy, position ETH principal increases (price moved up = more ETH value in range).

  2. Recenter: Positions are burned, collecting all accrued fees. New positions are minted at the new (higher) price. The ~8 ETH in WETH fees plus the ETH-side principal become redeployable liquidity.

  3. Sell leg (KRK → ETH): The attacker sells all acquired KRK back through the newly positioned liquidity. Another 1% fee applies. Because the attacker received fewer KRK than 800 ETH worth (due to buy-leg slippage), the sell leg returns significantly less than 800 ETH. The price drops back but the LM retains the slippage differential.

  4. Result: lm_eth_after ≈ 1000 + 117.9 ≈ 1117.9 ETH.

    delta_bps = (1117.9  1000) / 1000 × 10_000 = 1179 bps
    

    The ~117.9 ETH gain comes from: 1% fees on both legs (~16 ETH) plus ~102 ETH in concentrated-liquidity slippage loss by the attacker. The slippage component dominates because 800 ETH far exceeds the depth of the anchor/discovery positions, pushing the trade through increasingly thin liquidity.

Cross-check — why naive formula fails:

naive = 800 × 0.01 × 2 / 1000 × 10_000 = 160 bps   (actual: 1179 bps)

The naive estimate assumes uniform liquidity (constant slippage = fee rate only). The 7× difference is entirely due to concentrated-liquidity slippage on a trade that exceeds position depth.


Schema: evolution/YYYY-MM-DD.json

Records one optimizer evolution run.

{
  "date": "YYYY-MM-DD",
  "run_params": {
    "generations": 50,
    "population_size": 20,
    "seed": 42,
    "base_optimizer": "OptimizerV3"
  },
  "generation_stats": [
    {
      "generation": 1,
      "best_fitness": -12.4,
      "mean_fitness": -34.1,
      "worst_fitness": -91.2
    }
  ],
  "best_fitness": -8.7,
  "champion_file": "onchain/src/OptimizerV4.sol",
  "champion_commit": "abc1234",
  "verdict": "improved" | "no_improvement"
}
Field Type Description
date string (ISO) Date of the run
run_params object Input parameters used
generation_stats array Per-generation fitness summary
best_fitness number Best fitness score achieved (lower = better loss for LM)
champion_file string Repo-relative path to winning optimizer
champion_commit string Git commit SHA of the champion (if promoted)
verdict string "improved" or "no_improvement"

Schema: red-team/YYYY-MM-DD.json

Records one adversarial red-team run against a candidate optimizer.

{
  "date": "YYYY-MM-DD",
  "candidate": "OptimizerV3",
  "candidate_commit": "abc1234",
  "optimizer_profile": "push3-default",
  "lm_eth_before": 1000000000000000000000,
  "lm_eth_after": 998500000000000000000,
  "eth_extracted": 1500000000000000000,
  "floor_held": false,
  "methodology": "Each attack is snapshot-isolated: Anvil snapshot before, execute, measure, revert.",
  "verdict": "floor_broken" | "floor_held",
  "attacks": [
    {
      "strategy": "Flash buy + stake + recenter loop",
      "pattern": "wrap → buy → stake → recenter_multi → sell",
      "result": "DECREASED" | "HELD" | "INCREASED",
      "delta_bps": -150,
      "insight": "Rapid recenters pack ETH into floor while ratcheting it toward current price"
    }
  ]
}
Field Type Description
date string (ISO) Date of the run
candidate string Optimizer under test
candidate_commit string Git commit SHA of the optimizer under test
optimizer_profile string Named profile / push3 variant
lm_eth_before integer (wei) LM total ETH at start
lm_eth_after integer (wei) LM total ETH at end
eth_extracted integer (wei) lm_eth_before - lm_eth_after (0 if floor held)
floor_held boolean true if no ETH was extracted
methodology string How the red-team run was conducted (e.g. snapshot-isolation procedure, measurement tool, revert strategy). Free-text; should be detailed enough to reproduce the run independently
verdict string "floor_held" or "floor_broken"
attacks[].strategy string Human-readable strategy name
attacks[].pattern string Abstract op sequence (e.g. wrap → buy → stake)
attacks[].result string "DECREASED", "HELD", or "INCREASED"
attacks[].delta_bps integer LM ETH change in basis points
attacks[].insight string Key finding from this strategy

Snapshot-Isolation Methodology

All red-team runs use snapshot isolation as the standard methodology. This ensures that each attack is evaluated independently against the same initial state, rather than against a cumulative balance modified by prior attacks.

How it works:

  1. Before the first attack, the test runner records the initial lm_eth_before value and takes an Anvil snapshot via the anvil_snapshot RPC method.
  2. Each attack executes against this snapshot: run the attack, measure lm_eth_after, compute delta_bps, then revert to the snapshot via the anvil_revert RPC method.
  3. The next attack begins from the exact same chain state as the previous one.

Field semantics under snapshot isolation:

Field Semantics
lm_eth_before LM total ETH at the shared initial snapshot — identical for every attack in the run
lm_eth_after LM total ETH measured after this specific attack, before reverting
attacks[].delta_bps Change relative to the shared lm_eth_before, not relative to any prior attack

Key implications:

  • lm_eth_before and lm_eth_after reflect per-attack state, not cumulative historical balance. Each attack sees the same starting ETH.
  • Attack results are independent and order-insensitive — reordering attacks does not change any individual delta_bps value.

Schema: holdout/YYYY-MM-DD-prNNN.json

Records a holdout quality gate evaluation for a specific PR.

{
  "date": "YYYY-MM-DD",
  "pr": 123,
  "candidate_commit": "abc1234",
  "scenarios": [
    {
      "name": "bear_market_crash",
      "passed": true,
      "lm_eth_delta_bps": 12,
      "notes": ""
    },
    {
      "name": "flash_buy_exploit",
      "passed": false,
      "lm_eth_delta_bps": -340,
      "notes": "Floor broken on 2000-trade run"
    }
  ],
  "scenarios_passed": 4,
  "scenarios_total": 5,
  "gate_passed": false,
  "verdict": "pass" | "fail",
  "blocking_scenarios": ["flash_buy_exploit"]
}
Field Type Description
date string (ISO) Date of evaluation
pr integer PR number being evaluated
candidate_commit string Commit SHA under test
scenarios array One entry per holdout scenario
scenarios[].name string Scenario identifier
scenarios[].passed boolean Whether LM ETH held or improved
scenarios[].lm_eth_delta_bps integer LM ETH change in basis points
scenarios[].notes string Free-text notes on failure mode
scenarios_passed integer Count of passing scenarios
scenarios_total integer Total scenarios run
gate_passed boolean true if all required scenarios passed
verdict string "pass" or "fail"
blocking_scenarios array of strings Scenario names that caused failure

Schema: user-test/YYYY-MM-DD.json

Records a UX evaluation run across simulated personas.

{
  "date": "YYYY-MM-DD",
  "personas": [
    {
      "name": "crypto_native",
      "task": "stake_and_set_tax_rate",
      "completed": true,
      "friction_points": [],
      "screenshot_refs": ["tmp/screenshots/crypto_native_stake.png"],
      "notes": ""
    },
    {
      "name": "defi_newcomer",
      "task": "first_buy_and_stake",
      "completed": false,
      "friction_points": ["Tax rate slider label unclear", "No confirmation of stake tx"],
      "screenshot_refs": ["tmp/screenshots/defi_newcomer_confused.png"],
      "notes": "User abandoned at tax rate step"
    }
  ],
  "personas_completed": 1,
  "personas_total": 2,
  "critical_friction_points": ["Tax rate slider label unclear"],
  "verdict": "pass" | "fail"
}
Field Type Description
date string (ISO) Date of evaluation
personas array One entry per simulated persona
personas[].name string Persona identifier
personas[].task string Task the persona attempted
personas[].completed boolean Whether the task was completed
personas[].friction_points array of strings UX issues encountered
personas[].screenshot_refs array of strings Repo-relative paths to screenshots
personas[].notes string Free-text observations
personas_completed integer Count of personas who completed their task
personas_total integer Total personas evaluated
critical_friction_points array of strings Friction points that blocked task completion
verdict string "pass" if all personas completed, "fail" otherwise

Schema: resources/YYYY-MM-DD.json

Records one infrastructure resource snapshot.

{
  "date": "YYYY-MM-DD",
  "disk": {
    "used_bytes": 85899345920,
    "total_bytes": 107374182400,
    "used_pct": 80.0
  },
  "ram": {
    "used_bytes": 3221225472,
    "total_bytes": 8589934592,
    "used_pct": 37.5
  },
  "api": {
    "anthropic_calls_24h": 142,
    "anthropic_budget_usd_used": 4.87,
    "anthropic_budget_usd_limit": 50.0,
    "anthropic_budget_pct": 9.7
  },
  "ci": {
    "woodpecker_queue_depth": 2,
    "woodpecker_running": 1
  },
  "staleness_threshold_days": 1,
  "verdict": "ok" | "warn" | "critical"
}
Field Type Description
date string (ISO) Date of the snapshot
disk.used_bytes integer Bytes used on the primary volume
disk.total_bytes integer Total bytes on the primary volume
disk.used_pct number Percentage of disk used
ram.used_bytes integer Bytes of RAM in use
ram.total_bytes integer Total bytes of RAM
ram.used_pct number Percentage of RAM used
api.anthropic_calls_24h integer Anthropic API calls in the past 24 hours
api.anthropic_budget_usd_used number USD spent against the Anthropic budget
api.anthropic_budget_usd_limit number Configured Anthropic budget ceiling in USD
api.anthropic_budget_pct number Percentage of budget consumed
ci.woodpecker_queue_depth integer Number of jobs waiting in the Woodpecker CI queue
ci.woodpecker_running integer Number of Woodpecker jobs currently running
staleness_threshold_days integer Maximum age in days before this record is considered stale (always 1)
verdict string "ok" (all metrics normal), "warn" (≥80% on any dimension), or "critical" (≥95% on any dimension)

Schema: protocol/YYYY-MM-DD.json

Records one on-chain protocol health snapshot.

{
  "date": "YYYY-MM-DD",
  "block_number": 24500000,
  "tvl_eth": "1234567890000000000000",
  "tvl_eth_formatted": "1234.57",
  "accumulated_fees_eth": "12345678900000000",
  "accumulated_fees_eth_formatted": "0.012",
  "position_count": 3,
  "positions": [
    {
      "name": "floor",
      "tick_lower": -887272,
      "tick_upper": -200000,
      "liquidity": "987654321000000000"
    },
    {
      "name": "anchor",
      "tick_lower": -200000,
      "tick_upper": 0
    },
    {
      "name": "discovery",
      "tick_lower": 0,
      "tick_upper": 887272
    }
  ],
  "rebalance_count_24h": 4,
  "last_rebalance_block": 24499800,
  "staleness_threshold_days": 1,
  "verdict": "healthy" | "degraded" | "offline"
}
Field Type Description
date string (ISO) Date of the snapshot
block_number integer Block number at time of snapshot
tvl_eth string (wei) Total value locked across all LM positions in wei
tvl_eth_formatted string TVL formatted in ETH (2 dp)
accumulated_fees_eth string (wei) Fees accumulated by the LiquidityManager in wei
accumulated_fees_eth_formatted string Fees formatted in ETH (3 dp)
position_count integer Number of active Uniswap V3 positions (expected: 3)
positions array One entry per active position
positions[].name string Position label: "floor", "anchor", or "discovery"
positions[].tick_lower integer Lower tick boundary
positions[].tick_upper integer Upper tick boundary
positions[].liquidity string Liquidity amount in the position (wei-scale integer)
rebalance_count_24h integer Number of recenter() calls in the past 24 hours
last_rebalance_block integer Block number of the most recent recenter() call
staleness_threshold_days integer Maximum age in days before this record is considered stale (always 1)
verdict string "healthy" (positions active, TVL > 0), "degraded" (position_count < 3 or rebalance stalled), or "offline" (TVL = 0 or contract unreachable)