johba/harb

History

johba b3adea399f evidence: ponder 504 persistence check 2026-03-28 Test prediction #1185: ponder 504 Gateway Timeout is NOT persistent. Fresh stack start shows ponder healthy (<50ms, all 200 OK). Staking still blocked by webapp protocol-stats fetch issue. 0/5 personas completed staking, 5/5 wallet+buy succeeded. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>		2026-03-28 07:27:27 +00:00
..
evolution	fix: Evidence directory structure for process results (#973 )	2026-03-19 08:28:04 +00:00
holdout	fix: Investigate: adversary parasitic LP extracts 29% from holder, all recenters fail (#517 )	2026-03-22 19:45:35 +00:00
protocol	fix: feat: implement evidence/resources and evidence/protocol logging (#1059 )	2026-03-21 19:39:23 +00:00
red-team	evidence: red-team OptimizerV3 full session 2026-03-27 (re-run)	2026-03-27 14:18:13 +00:00
resources	fix: feat: implement evidence/resources and evidence/protocol logging (#1059 )	2026-03-21 19:39:23 +00:00
user-test	evidence: ponder 504 persistence check 2026-03-28	2026-03-28 07:27:27 +00:00
README.md	fix: evidence/README.md schema should be updated to include candidate_commit and methodology fields (#1086 )	2026-03-24 21:27:44 +00:00

README.md

Evidence Directory

Machine-readable process results for the KRAIKEN optimizer pipeline. All formulas (evolution, red-team, holdout, user-test) write structured JSON here.

Purpose

Planner input — the planner reads these files to decide next actions (e.g. "last red-team showed IL vulnerability → trigger evolution").
Diffable history — git log evidence/ shows how metrics change over time.
Permanent record — separate from tmp/ which is ephemeral.

Directory Layout

evidence/
  evolution/
    YYYY-MM-DD.json       # run params, generation stats, best fitness, champion file
  red-team/
    YYYY-MM-DD.json       # per-attack results, floor held/broken, ETH extracted
  holdout/
    YYYY-MM-DD-prNNN.json # per-scenario pass/fail, gate decision
  user-test/
    YYYY-MM-DD.json       # per-persona reports, screenshot refs, friction points
  resources/
    YYYY-MM-DD.json       # disk, RAM, API call counts, budget burn, CI queue depth
  protocol/
    YYYY-MM-DD.json       # TVL, accumulated fees, position count, rebalance frequency

Delivery Pattern

Every formula follows the same three-step pattern:

Evidence file → committed to evidence/ on main
Git artifacts (new code, attack vectors, evolved programs) → PR
Human summary → issue comment with key metrics + link to evidence file

Fee-Income Calculation Model

This section documents how delta_bps values in red-team and holdout evidence files are derived, so that recorded values can be independently verified.

Measurement tool

delta_bps is computed from two snapshots of LM total ETH taken by onchain/script/LmTotalEth.s.sol:

lm_total_eth = lm.balance (free ETH)
             + WETH.balanceOf(lm) (free WETH)
             + Σ positionEthPrincipal(stage) for stage ∈ {FLOOR, ANCHOR, DISCOVERY}

Each position's ETH principal is calculated via LiquidityAmounts.getAmountsForLiquidity at the pool's current sqrtPriceX96. Only the WETH side of each position is summed; the KRK side is excluded.

What is and is not counted

Counted	Not counted
Free native ETH on the LM contract	KRK balance (free or in positions)
Free WETH (ERC-20) on the LM contract	Uncollected fees still inside Uni V3 positions
ETH-side principal of all 3 positions	KRK fees transferred to `feeDestination`

Key consequence: Uncollected fees accrued inside Uniswap V3 positions are invisible to LmTotalEth until a recenter() call executes pool.burn + pool.collect, which converts them into free WETH on the LM contract (or transfers them to feeDestination). A recenter() between the two snapshots materializes these fees into the measurement.

`delta_bps` formula

delta_bps = (lm_eth_after − lm_eth_before) / lm_eth_before × 10_000

Where lm_eth_before and lm_eth_after are LmTotalEth readings taken before and after the attack sequence. Each attack is snapshot-isolated (Anvil snapshot → execute → measure → revert), so per-attack delta_bps values are independent.

Components that drive `delta_bps`

A round-trip trade (buy KRK with ETH, then sell KRK back for ETH) through the LM's dominant positions produces a positive delta_bps from three sources:

Pool fee income (1% per leg). The WETH/KRK pool charges a 1% fee (FEE = 10_000 in LiquidityManager.sol). On a simple round trip this contributes ~2% of volume. However, fees accrue as uncollected position fees and only become visible after recenter() materializes them. If no recenter occurs between snapshots, fee income is partially hidden (reflected only indirectly through reduced trade output).
Concentrated-liquidity slippage. The LM's three-position strategy concentrates most liquidity in narrow tick ranges. Trades that exceed the depth of a position range push through progressively thinner liquidity, causing super-linear slippage. The attacker receives fewer tokens per unit of input on each marginal unit. This slippage transfers value to the LM's positions as increased ETH principal.
Recenter repositioning gain. When recenter() is called between trade legs:
- All three positions are burned and fees collected.
- New positions are minted at the current price.
- Any accumulated fees (WETH portion) become free WETH and are redeployed as new position liquidity. KRK fees are sent to feeDestination.
- The repositioned liquidity changes the tick ranges the next trade interacts with.

Why `delta_bps` is non-linear

A naive estimate of delta_bps ≈ volume × 1% × 2 legs / lm_eth_before × 10_000 underestimates the actual value for large trades because:

Slippage dominates at high volume. When trade volume approaches or exceeds the ETH depth of the active positions, the price moves through the entire concentrated range and into thin or empty ticks. The slippage loss to the attacker (= gain to the LM) grows super-linearly with volume.
Multi-recenter compounding. Strategies that call recenter() between sub-trades materialize intermediate fees and reposition liquidity at a new price. Subsequent trades pay fees at the new tick ranges, compounding the total fee capture.
KRK fee exclusion. KRK fees collected during recenter() are transferred to feeDestination and excluded from LmTotalEth. This means the measurement captures the ETH-side gain but not the KRK-side gain — delta_bps understates total protocol revenue.

Fee destination behaviour

When feeDestination is address(0) or address(this) (the LM contract itself), fees are not transferred out — they remain as deployable liquidity on the LM. In this configuration, materialized WETH fees increase lm_total_eth directly. When feeDestination is an external address, WETH fees are transferred out and do not contribute to lm_total_eth. The red-team test environment uses feeDestination = address(this) so that fee income is fully reflected in delta_bps.

Worked example

Using attacks[1] from evidence/red-team/2026-03-20.json:

"Buy → Recenter → Sell (800 ETH round trip)" — delta_bps: 1179

Given:

lm_eth_before = 999,999,999,999,999,999,998 wei ≈ 1000 ETH
Trade volume = 800 ETH (buy leg) + equivalent KRK sell leg
Pool fee rate = 1% per swap
feeDestination = address(this) (fees stay in LM)

Step-by-step derivation:

Buy leg (800 ETH → KRK): The 800 ETH buy pushes the price ~4000 ticks into the concentrated positions. The pool charges 1% (≈8 ETH in fees accruing to positions). Because liquidity is concentrated, the price moves far — the attacker receives significantly fewer KRK than a constant-product AMM would give. After the buy, position ETH principal increases (price moved up = more ETH value in range).
Recenter: Positions are burned, collecting all accrued fees. New positions are minted at the new (higher) price. The ~8 ETH in WETH fees plus the ETH-side principal become redeployable liquidity.
Sell leg (KRK → ETH): The attacker sells all acquired KRK back through the newly positioned liquidity. Another 1% fee applies. Because the attacker received fewer KRK than 800 ETH worth (due to buy-leg slippage), the sell leg returns significantly less than 800 ETH. The price drops back but the LM retains the slippage differential.
Result: lm_eth_after ≈ 1000 + 117.9 ≈ 1117.9 ETH.
```
delta_bps = (1117.9 − 1000) / 1000 × 10_000 = 1179 bps
```
The ~117.9 ETH gain comes from: 1% fees on both legs (~16 ETH) plus ~102 ETH in concentrated-liquidity slippage loss by the attacker. The slippage component dominates because 800 ETH far exceeds the depth of the anchor/discovery positions, pushing the trade through increasingly thin liquidity.

Cross-check — why naive formula fails:

naive = 800 × 0.01 × 2 / 1000 × 10_000 = 160 bps   (actual: 1179 bps)

The naive estimate assumes uniform liquidity (constant slippage = fee rate only). The 7× difference is entirely due to concentrated-liquidity slippage on a trade that exceeds position depth.

Schema: `evolution/YYYY-MM-DD.json`

Records one optimizer evolution run.

{
  "date": "YYYY-MM-DD",
  "run_params": {
    "generations": 50,
    "population_size": 20,
    "seed": 42,
    "base_optimizer": "OptimizerV3"
  },
  "generation_stats": [
    {
      "generation": 1,
      "best_fitness": -12.4,
      "mean_fitness": -34.1,
      "worst_fitness": -91.2
    }
  ],
  "best_fitness": -8.7,
  "champion_file": "onchain/src/OptimizerV4.sol",
  "champion_commit": "abc1234",
  "verdict": "improved" | "no_improvement"
}

Field	Type	Description
`date`	string (ISO)	Date of the run
`run_params`	object	Input parameters used
`generation_stats`	array	Per-generation fitness summary
`best_fitness`	number	Best fitness score achieved (lower = better loss for LM)
`champion_file`	string	Repo-relative path to winning optimizer
`champion_commit`	string	Git commit SHA of the champion (if promoted)
`verdict`	string	`"improved"` or `"no_improvement"`

Schema: `red-team/YYYY-MM-DD.json`

Records one adversarial red-team run against a candidate optimizer.

{
  "date": "YYYY-MM-DD",
  "candidate": "OptimizerV3",
  "candidate_commit": "abc1234",
  "optimizer_profile": "push3-default",
  "lm_eth_before": 1000000000000000000000,
  "lm_eth_after": 998500000000000000000,
  "eth_extracted": 1500000000000000000,
  "floor_held": false,
  "methodology": "Each attack is snapshot-isolated: Anvil snapshot before, execute, measure, revert.",
  "verdict": "floor_broken" | "floor_held",
  "attacks": [
    {
      "strategy": "Flash buy + stake + recenter loop",
      "pattern": "wrap → buy → stake → recenter_multi → sell",
      "result": "DECREASED" | "HELD" | "INCREASED",
      "delta_bps": -150,
      "insight": "Rapid recenters pack ETH into floor while ratcheting it toward current price"
    }
  ]
}

Field	Type	Description
`date`	string (ISO)	Date of the run
`candidate`	string	Optimizer under test
`candidate_commit`	string	Git commit SHA of the optimizer under test
`optimizer_profile`	string	Named profile / push3 variant
`lm_eth_before`	integer (wei)	LM total ETH at start
`lm_eth_after`	integer (wei)	LM total ETH at end
`eth_extracted`	integer (wei)	`lm_eth_before - lm_eth_after` (0 if floor held)
`floor_held`	boolean	`true` if no ETH was extracted
`methodology`	string	How the red-team run was conducted (e.g. snapshot-isolation procedure, measurement tool, revert strategy). Free-text; should be detailed enough to reproduce the run independently
`verdict`	string	`"floor_held"` or `"floor_broken"`
`attacks[].strategy`	string	Human-readable strategy name
`attacks[].pattern`	string	Abstract op sequence (e.g. `wrap → buy → stake`)
`attacks[].result`	string	`"DECREASED"`, `"HELD"`, or `"INCREASED"`
`attacks[].delta_bps`	integer	LM ETH change in basis points
`attacks[].insight`	string	Key finding from this strategy

Snapshot-Isolation Methodology

All red-team runs use snapshot isolation as the standard methodology. This ensures that each attack is evaluated independently against the same initial state, rather than against a cumulative balance modified by prior attacks.

How it works:

Before the first attack, the test runner records the initial lm_eth_before value and takes an Anvil snapshot via the anvil_snapshot RPC method.
Each attack executes against this snapshot: run the attack, measure lm_eth_after, compute delta_bps, then revert to the snapshot via the anvil_revert RPC method.
The next attack begins from the exact same chain state as the previous one.

Field semantics under snapshot isolation:

Field	Semantics
`lm_eth_before`	LM total ETH at the shared initial snapshot — identical for every attack in the run
`lm_eth_after`	LM total ETH measured after this specific attack, before reverting
`attacks[].delta_bps`	Change relative to the shared `lm_eth_before`, not relative to any prior attack

Key implications:

lm_eth_before and lm_eth_after reflect per-attack state, not cumulative historical balance. Each attack sees the same starting ETH.
Attack results are independent and order-insensitive — reordering attacks does not change any individual delta_bps value.

Schema: `holdout/YYYY-MM-DD-prNNN.json`

Records a holdout quality gate evaluation for a specific PR.

{
  "date": "YYYY-MM-DD",
  "pr": 123,
  "candidate_commit": "abc1234",
  "scenarios": [
    {
      "name": "bear_market_crash",
      "passed": true,
      "lm_eth_delta_bps": 12,
      "notes": ""
    },
    {
      "name": "flash_buy_exploit",
      "passed": false,
      "lm_eth_delta_bps": -340,
      "notes": "Floor broken on 2000-trade run"
    }
  ],
  "scenarios_passed": 4,
  "scenarios_total": 5,
  "gate_passed": false,
  "verdict": "pass" | "fail",
  "blocking_scenarios": ["flash_buy_exploit"]
}

Field	Type	Description
`date`	string (ISO)	Date of evaluation
`pr`	integer	PR number being evaluated
`candidate_commit`	string	Commit SHA under test
`scenarios`	array	One entry per holdout scenario
`scenarios[].name`	string	Scenario identifier
`scenarios[].passed`	boolean	Whether LM ETH held or improved
`scenarios[].lm_eth_delta_bps`	integer	LM ETH change in basis points
`scenarios[].notes`	string	Free-text notes on failure mode
`scenarios_passed`	integer	Count of passing scenarios
`scenarios_total`	integer	Total scenarios run
`gate_passed`	boolean	`true` if all required scenarios passed
`verdict`	string	`"pass"` or `"fail"`
`blocking_scenarios`	array of strings	Scenario names that caused failure

Schema: `user-test/YYYY-MM-DD.json`

Records a UX evaluation run across simulated personas.

{
  "date": "YYYY-MM-DD",
  "personas": [
    {
      "name": "crypto_native",
      "task": "stake_and_set_tax_rate",
      "completed": true,
      "friction_points": [],
      "screenshot_refs": ["tmp/screenshots/crypto_native_stake.png"],
      "notes": ""
    },
    {
      "name": "defi_newcomer",
      "task": "first_buy_and_stake",
      "completed": false,
      "friction_points": ["Tax rate slider label unclear", "No confirmation of stake tx"],
      "screenshot_refs": ["tmp/screenshots/defi_newcomer_confused.png"],
      "notes": "User abandoned at tax rate step"
    }
  ],
  "personas_completed": 1,
  "personas_total": 2,
  "critical_friction_points": ["Tax rate slider label unclear"],
  "verdict": "pass" | "fail"
}

Field	Type	Description
`date`	string (ISO)	Date of evaluation
`personas`	array	One entry per simulated persona
`personas[].name`	string	Persona identifier
`personas[].task`	string	Task the persona attempted
`personas[].completed`	boolean	Whether the task was completed
`personas[].friction_points`	array of strings	UX issues encountered
`personas[].screenshot_refs`	array of strings	Repo-relative paths to screenshots
`personas[].notes`	string	Free-text observations
`personas_completed`	integer	Count of personas who completed their task
`personas_total`	integer	Total personas evaluated
`critical_friction_points`	array of strings	Friction points that blocked task completion
`verdict`	string	`"pass"` if all personas completed, `"fail"` otherwise

Schema: `resources/YYYY-MM-DD.json`

Records one infrastructure resource snapshot.

{
  "date": "YYYY-MM-DD",
  "disk": {
    "used_bytes": 85899345920,
    "total_bytes": 107374182400,
    "used_pct": 80.0
  },
  "ram": {
    "used_bytes": 3221225472,
    "total_bytes": 8589934592,
    "used_pct": 37.5
  },
  "api": {
    "anthropic_calls_24h": 142,
    "anthropic_budget_usd_used": 4.87,
    "anthropic_budget_usd_limit": 50.0,
    "anthropic_budget_pct": 9.7
  },
  "ci": {
    "woodpecker_queue_depth": 2,
    "woodpecker_running": 1
  },
  "staleness_threshold_days": 1,
  "verdict": "ok" | "warn" | "critical"
}

Field	Type	Description
`date`	string (ISO)	Date of the snapshot
`disk.used_bytes`	integer	Bytes used on the primary volume
`disk.total_bytes`	integer	Total bytes on the primary volume
`disk.used_pct`	number	Percentage of disk used
`ram.used_bytes`	integer	Bytes of RAM in use
`ram.total_bytes`	integer	Total bytes of RAM
`ram.used_pct`	number	Percentage of RAM used
`api.anthropic_calls_24h`	integer	Anthropic API calls in the past 24 hours
`api.anthropic_budget_usd_used`	number	USD spent against the Anthropic budget
`api.anthropic_budget_usd_limit`	number	Configured Anthropic budget ceiling in USD
`api.anthropic_budget_pct`	number	Percentage of budget consumed
`ci.woodpecker_queue_depth`	integer	Number of jobs waiting in the Woodpecker CI queue
`ci.woodpecker_running`	integer	Number of Woodpecker jobs currently running
`staleness_threshold_days`	integer	Maximum age in days before this record is considered stale (always 1)
`verdict`	string	`"ok"` (all metrics normal), `"warn"` (≥80% on any dimension), or `"critical"` (≥95% on any dimension)

Schema: `protocol/YYYY-MM-DD.json`

Records one on-chain protocol health snapshot.

{
  "date": "YYYY-MM-DD",
  "block_number": 24500000,
  "tvl_eth": "1234567890000000000000",
  "tvl_eth_formatted": "1234.57",
  "accumulated_fees_eth": "12345678900000000",
  "accumulated_fees_eth_formatted": "0.012",
  "position_count": 3,
  "positions": [
    {
      "name": "floor",
      "tick_lower": -887272,
      "tick_upper": -200000,
      "liquidity": "987654321000000000"
    },
    {
      "name": "anchor",
      "tick_lower": -200000,
      "tick_upper": 0
    },
    {
      "name": "discovery",
      "tick_lower": 0,
      "tick_upper": 887272
    }
  ],
  "rebalance_count_24h": 4,
  "last_rebalance_block": 24499800,
  "staleness_threshold_days": 1,
  "verdict": "healthy" | "degraded" | "offline"
}

Field	Type	Description
`date`	string (ISO)	Date of the snapshot
`block_number`	integer	Block number at time of snapshot
`tvl_eth`	string (wei)	Total value locked across all LM positions in wei
`tvl_eth_formatted`	string	TVL formatted in ETH (2 dp)
`accumulated_fees_eth`	string (wei)	Fees accumulated by the LiquidityManager in wei
`accumulated_fees_eth_formatted`	string	Fees formatted in ETH (3 dp)
`position_count`	integer	Number of active Uniswap V3 positions (expected: 3)
`positions`	array	One entry per active position
`positions[].name`	string	Position label: `"floor"`, `"anchor"`, or `"discovery"`
`positions[].tick_lower`	integer	Lower tick boundary
`positions[].tick_upper`	integer	Upper tick boundary
`positions[].liquidity`	string	Liquidity amount in the position (wei-scale integer)
`rebalance_count_24h`	integer	Number of `recenter()` calls in the past 24 hours
`last_rebalance_block`	integer	Block number of the most recent `recenter()` call
`staleness_threshold_days`	integer	Maximum age in days before this record is considered stale (always 1)
`verdict`	string	`"healthy"` (positions active, TVL > 0), `"degraded"` (position_count < 3 or rebalance stalled), or `"offline"` (TVL = 0 or contract unreachable)

README.md Unescape Escape

Evidence Directory

Purpose

Directory Layout

Delivery Pattern

Fee-Income Calculation Model

Measurement tool

What is and is not counted

delta_bps formula

Components that drive delta_bps

Why delta_bps is non-linear

Fee destination behaviour

Worked example

Schema: evolution/YYYY-MM-DD.json

Schema: red-team/YYYY-MM-DD.json