Test prediction #1185: ponder 504 Gateway Timeout is NOT persistent. Fresh stack start shows ponder healthy (<50ms, all 200 OK). Staking still blocked by webapp protocol-stats fetch issue. 0/5 personas completed staking, 5/5 wallet+buy succeeded. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| evolution | ||
| holdout | ||
| protocol | ||
| red-team | ||
| resources | ||
| user-test | ||
| README.md | ||
Evidence Directory
Machine-readable process results for the KRAIKEN optimizer pipeline. All formulas (evolution, red-team, holdout, user-test) write structured JSON here.
Purpose
- Planner input — the planner reads these files to decide next actions (e.g. "last red-team showed IL vulnerability → trigger evolution").
- Diffable history —
git log evidence/shows how metrics change over time. - Permanent record — separate from
tmp/which is ephemeral.
Directory Layout
evidence/
evolution/
YYYY-MM-DD.json # run params, generation stats, best fitness, champion file
red-team/
YYYY-MM-DD.json # per-attack results, floor held/broken, ETH extracted
holdout/
YYYY-MM-DD-prNNN.json # per-scenario pass/fail, gate decision
user-test/
YYYY-MM-DD.json # per-persona reports, screenshot refs, friction points
resources/
YYYY-MM-DD.json # disk, RAM, API call counts, budget burn, CI queue depth
protocol/
YYYY-MM-DD.json # TVL, accumulated fees, position count, rebalance frequency
Delivery Pattern
Every formula follows the same three-step pattern:
- Evidence file → committed to
evidence/on main - Git artifacts (new code, attack vectors, evolved programs) → PR
- Human summary → issue comment with key metrics + link to evidence file
Fee-Income Calculation Model
This section documents how delta_bps values in red-team and holdout evidence files
are derived, so that recorded values can be independently verified.
Measurement tool
delta_bps is computed from two snapshots of LM total ETH taken by
onchain/script/LmTotalEth.s.sol:
lm_total_eth = lm.balance (free ETH)
+ WETH.balanceOf(lm) (free WETH)
+ Σ positionEthPrincipal(stage) for stage ∈ {FLOOR, ANCHOR, DISCOVERY}
Each position's ETH principal is calculated via LiquidityAmounts.getAmountsForLiquidity
at the pool's current sqrtPriceX96. Only the WETH side of each position is summed;
the KRK side is excluded.
What is and is not counted
| Counted | Not counted |
|---|---|
| Free native ETH on the LM contract | KRK balance (free or in positions) |
| Free WETH (ERC-20) on the LM contract | Uncollected fees still inside Uni V3 positions |
| ETH-side principal of all 3 positions | KRK fees transferred to feeDestination |
Key consequence: Uncollected fees accrued inside Uniswap V3 positions are invisible
to LmTotalEth until a recenter() call executes pool.burn + pool.collect, which
converts them into free WETH on the LM contract (or transfers them to feeDestination).
A recenter() between the two snapshots materializes these fees into the measurement.
delta_bps formula
delta_bps = (lm_eth_after − lm_eth_before) / lm_eth_before × 10_000
Where lm_eth_before and lm_eth_after are LmTotalEth readings taken before and
after the attack sequence. Each attack is snapshot-isolated (Anvil snapshot → execute →
measure → revert), so per-attack delta_bps values are independent.
Components that drive delta_bps
A round-trip trade (buy KRK with ETH, then sell KRK back for ETH) through the LM's
dominant positions produces a positive delta_bps from three sources:
-
Pool fee income (1% per leg). The WETH/KRK pool charges a 1% fee (
FEE = 10_000inLiquidityManager.sol). On a simple round trip this contributes ~2% of volume. However, fees accrue as uncollected position fees and only become visible afterrecenter()materializes them. If no recenter occurs between snapshots, fee income is partially hidden (reflected only indirectly through reduced trade output). -
Concentrated-liquidity slippage. The LM's three-position strategy concentrates most liquidity in narrow tick ranges. Trades that exceed the depth of a position range push through progressively thinner liquidity, causing super-linear slippage. The attacker receives fewer tokens per unit of input on each marginal unit. This slippage transfers value to the LM's positions as increased ETH principal.
-
Recenter repositioning gain. When
recenter()is called between trade legs:- All three positions are burned and fees collected.
- New positions are minted at the current price.
- Any accumulated fees (WETH portion) become free WETH and are redeployed as new
position liquidity. KRK fees are sent to
feeDestination. - The repositioned liquidity changes the tick ranges the next trade interacts with.
Why delta_bps is non-linear
A naive estimate of delta_bps ≈ volume × 1% × 2 legs / lm_eth_before × 10_000
underestimates the actual value for large trades because:
- Slippage dominates at high volume. When trade volume approaches or exceeds the ETH depth of the active positions, the price moves through the entire concentrated range and into thin or empty ticks. The slippage loss to the attacker (= gain to the LM) grows super-linearly with volume.
- Multi-recenter compounding. Strategies that call
recenter()between sub-trades materialize intermediate fees and reposition liquidity at a new price. Subsequent trades pay fees at the new tick ranges, compounding the total fee capture. - KRK fee exclusion. KRK fees collected during
recenter()are transferred tofeeDestinationand excluded fromLmTotalEth. This means the measurement captures the ETH-side gain but not the KRK-side gain —delta_bpsunderstates total protocol revenue.
Fee destination behaviour
When feeDestination is address(0) or address(this) (the LM contract itself),
fees are not transferred out — they remain as deployable liquidity on the LM.
In this configuration, materialized WETH fees increase lm_total_eth directly. When
feeDestination is an external address, WETH fees are transferred out and do not
contribute to lm_total_eth. The red-team test environment uses feeDestination = address(this) so that fee income is fully reflected in delta_bps.
Worked example
Using attacks[1] from evidence/red-team/2026-03-20.json:
"Buy → Recenter → Sell (800 ETH round trip)" —
delta_bps: 1179
Given:
lm_eth_before= 999,999,999,999,999,999,998 wei ≈ 1000 ETH- Trade volume = 800 ETH (buy leg) + equivalent KRK sell leg
- Pool fee rate = 1% per swap
feeDestination = address(this)(fees stay in LM)
Step-by-step derivation:
-
Buy leg (800 ETH → KRK): The 800 ETH buy pushes the price ~4000 ticks into the concentrated positions. The pool charges 1% (≈8 ETH in fees accruing to positions). Because liquidity is concentrated, the price moves far — the attacker receives significantly fewer KRK than a constant-product AMM would give. After the buy, position ETH principal increases (price moved up = more ETH value in range).
-
Recenter: Positions are burned, collecting all accrued fees. New positions are minted at the new (higher) price. The ~8 ETH in WETH fees plus the ETH-side principal become redeployable liquidity.
-
Sell leg (KRK → ETH): The attacker sells all acquired KRK back through the newly positioned liquidity. Another 1% fee applies. Because the attacker received fewer KRK than 800 ETH worth (due to buy-leg slippage), the sell leg returns significantly less than 800 ETH. The price drops back but the LM retains the slippage differential.
-
Result:
lm_eth_after ≈ 1000 + 117.9 ≈ 1117.9 ETH.delta_bps = (1117.9 − 1000) / 1000 × 10_000 = 1179 bpsThe ~117.9 ETH gain comes from: 1% fees on both legs (~16 ETH) plus ~102 ETH in concentrated-liquidity slippage loss by the attacker. The slippage component dominates because 800 ETH far exceeds the depth of the anchor/discovery positions, pushing the trade through increasingly thin liquidity.
Cross-check — why naive formula fails:
naive = 800 × 0.01 × 2 / 1000 × 10_000 = 160 bps (actual: 1179 bps)
The naive estimate assumes uniform liquidity (constant slippage = fee rate only). The 7× difference is entirely due to concentrated-liquidity slippage on a trade that exceeds position depth.
Schema: evolution/YYYY-MM-DD.json
Records one optimizer evolution run.
{
"date": "YYYY-MM-DD",
"run_params": {
"generations": 50,
"population_size": 20,
"seed": 42,
"base_optimizer": "OptimizerV3"
},
"generation_stats": [
{
"generation": 1,
"best_fitness": -12.4,
"mean_fitness": -34.1,
"worst_fitness": -91.2
}
],
"best_fitness": -8.7,
"champion_file": "onchain/src/OptimizerV4.sol",
"champion_commit": "abc1234",
"verdict": "improved" | "no_improvement"
}
| Field | Type | Description |
|---|---|---|
date |
string (ISO) | Date of the run |
run_params |
object | Input parameters used |
generation_stats |
array | Per-generation fitness summary |
best_fitness |
number | Best fitness score achieved (lower = better loss for LM) |
champion_file |
string | Repo-relative path to winning optimizer |
champion_commit |
string | Git commit SHA of the champion (if promoted) |
verdict |
string | "improved" or "no_improvement" |
Schema: red-team/YYYY-MM-DD.json
Records one adversarial red-team run against a candidate optimizer.
{
"date": "YYYY-MM-DD",
"candidate": "OptimizerV3",
"candidate_commit": "abc1234",
"optimizer_profile": "push3-default",
"lm_eth_before": 1000000000000000000000,
"lm_eth_after": 998500000000000000000,
"eth_extracted": 1500000000000000000,
"floor_held": false,
"methodology": "Each attack is snapshot-isolated: Anvil snapshot before, execute, measure, revert.",
"verdict": "floor_broken" | "floor_held",
"attacks": [
{
"strategy": "Flash buy + stake + recenter loop",
"pattern": "wrap → buy → stake → recenter_multi → sell",
"result": "DECREASED" | "HELD" | "INCREASED",
"delta_bps": -150,
"insight": "Rapid recenters pack ETH into floor while ratcheting it toward current price"
}
]
}
| Field | Type | Description |
|---|---|---|
date |
string (ISO) | Date of the run |
candidate |
string | Optimizer under test |
candidate_commit |
string | Git commit SHA of the optimizer under test |
optimizer_profile |
string | Named profile / push3 variant |
lm_eth_before |
integer (wei) | LM total ETH at start |
lm_eth_after |
integer (wei) | LM total ETH at end |
eth_extracted |
integer (wei) | lm_eth_before - lm_eth_after (0 if floor held) |
floor_held |
boolean | true if no ETH was extracted |
methodology |
string | How the red-team run was conducted (e.g. snapshot-isolation procedure, measurement tool, revert strategy). Free-text; should be detailed enough to reproduce the run independently |
verdict |
string | "floor_held" or "floor_broken" |
attacks[].strategy |
string | Human-readable strategy name |
attacks[].pattern |
string | Abstract op sequence (e.g. wrap → buy → stake) |
attacks[].result |
string | "DECREASED", "HELD", or "INCREASED" |
attacks[].delta_bps |
integer | LM ETH change in basis points |
attacks[].insight |
string | Key finding from this strategy |
Snapshot-Isolation Methodology
All red-team runs use snapshot isolation as the standard methodology. This ensures that each attack is evaluated independently against the same initial state, rather than against a cumulative balance modified by prior attacks.
How it works:
- Before the first attack, the test runner records the initial
lm_eth_beforevalue and takes an Anvil snapshot via theanvil_snapshotRPC method. - Each attack executes against this snapshot: run the attack, measure
lm_eth_after, computedelta_bps, then revert to the snapshot via theanvil_revertRPC method. - The next attack begins from the exact same chain state as the previous one.
Field semantics under snapshot isolation:
| Field | Semantics |
|---|---|
lm_eth_before |
LM total ETH at the shared initial snapshot — identical for every attack in the run |
lm_eth_after |
LM total ETH measured after this specific attack, before reverting |
attacks[].delta_bps |
Change relative to the shared lm_eth_before, not relative to any prior attack |
Key implications:
lm_eth_beforeandlm_eth_afterreflect per-attack state, not cumulative historical balance. Each attack sees the same starting ETH.- Attack results are independent and order-insensitive — reordering attacks does
not change any individual
delta_bpsvalue.
Schema: holdout/YYYY-MM-DD-prNNN.json
Records a holdout quality gate evaluation for a specific PR.
{
"date": "YYYY-MM-DD",
"pr": 123,
"candidate_commit": "abc1234",
"scenarios": [
{
"name": "bear_market_crash",
"passed": true,
"lm_eth_delta_bps": 12,
"notes": ""
},
{
"name": "flash_buy_exploit",
"passed": false,
"lm_eth_delta_bps": -340,
"notes": "Floor broken on 2000-trade run"
}
],
"scenarios_passed": 4,
"scenarios_total": 5,
"gate_passed": false,
"verdict": "pass" | "fail",
"blocking_scenarios": ["flash_buy_exploit"]
}
| Field | Type | Description |
|---|---|---|
date |
string (ISO) | Date of evaluation |
pr |
integer | PR number being evaluated |
candidate_commit |
string | Commit SHA under test |
scenarios |
array | One entry per holdout scenario |
scenarios[].name |
string | Scenario identifier |
scenarios[].passed |
boolean | Whether LM ETH held or improved |
scenarios[].lm_eth_delta_bps |
integer | LM ETH change in basis points |
scenarios[].notes |
string | Free-text notes on failure mode |
scenarios_passed |
integer | Count of passing scenarios |
scenarios_total |
integer | Total scenarios run |
gate_passed |
boolean | true if all required scenarios passed |
verdict |
string | "pass" or "fail" |
blocking_scenarios |
array of strings | Scenario names that caused failure |
Schema: user-test/YYYY-MM-DD.json
Records a UX evaluation run across simulated personas.
{
"date": "YYYY-MM-DD",
"personas": [
{
"name": "crypto_native",
"task": "stake_and_set_tax_rate",
"completed": true,
"friction_points": [],
"screenshot_refs": ["tmp/screenshots/crypto_native_stake.png"],
"notes": ""
},
{
"name": "defi_newcomer",
"task": "first_buy_and_stake",
"completed": false,
"friction_points": ["Tax rate slider label unclear", "No confirmation of stake tx"],
"screenshot_refs": ["tmp/screenshots/defi_newcomer_confused.png"],
"notes": "User abandoned at tax rate step"
}
],
"personas_completed": 1,
"personas_total": 2,
"critical_friction_points": ["Tax rate slider label unclear"],
"verdict": "pass" | "fail"
}
| Field | Type | Description |
|---|---|---|
date |
string (ISO) | Date of evaluation |
personas |
array | One entry per simulated persona |
personas[].name |
string | Persona identifier |
personas[].task |
string | Task the persona attempted |
personas[].completed |
boolean | Whether the task was completed |
personas[].friction_points |
array of strings | UX issues encountered |
personas[].screenshot_refs |
array of strings | Repo-relative paths to screenshots |
personas[].notes |
string | Free-text observations |
personas_completed |
integer | Count of personas who completed their task |
personas_total |
integer | Total personas evaluated |
critical_friction_points |
array of strings | Friction points that blocked task completion |
verdict |
string | "pass" if all personas completed, "fail" otherwise |
Schema: resources/YYYY-MM-DD.json
Records one infrastructure resource snapshot.
{
"date": "YYYY-MM-DD",
"disk": {
"used_bytes": 85899345920,
"total_bytes": 107374182400,
"used_pct": 80.0
},
"ram": {
"used_bytes": 3221225472,
"total_bytes": 8589934592,
"used_pct": 37.5
},
"api": {
"anthropic_calls_24h": 142,
"anthropic_budget_usd_used": 4.87,
"anthropic_budget_usd_limit": 50.0,
"anthropic_budget_pct": 9.7
},
"ci": {
"woodpecker_queue_depth": 2,
"woodpecker_running": 1
},
"staleness_threshold_days": 1,
"verdict": "ok" | "warn" | "critical"
}
| Field | Type | Description |
|---|---|---|
date |
string (ISO) | Date of the snapshot |
disk.used_bytes |
integer | Bytes used on the primary volume |
disk.total_bytes |
integer | Total bytes on the primary volume |
disk.used_pct |
number | Percentage of disk used |
ram.used_bytes |
integer | Bytes of RAM in use |
ram.total_bytes |
integer | Total bytes of RAM |
ram.used_pct |
number | Percentage of RAM used |
api.anthropic_calls_24h |
integer | Anthropic API calls in the past 24 hours |
api.anthropic_budget_usd_used |
number | USD spent against the Anthropic budget |
api.anthropic_budget_usd_limit |
number | Configured Anthropic budget ceiling in USD |
api.anthropic_budget_pct |
number | Percentage of budget consumed |
ci.woodpecker_queue_depth |
integer | Number of jobs waiting in the Woodpecker CI queue |
ci.woodpecker_running |
integer | Number of Woodpecker jobs currently running |
staleness_threshold_days |
integer | Maximum age in days before this record is considered stale (always 1) |
verdict |
string | "ok" (all metrics normal), "warn" (≥80% on any dimension), or "critical" (≥95% on any dimension) |
Schema: protocol/YYYY-MM-DD.json
Records one on-chain protocol health snapshot.
{
"date": "YYYY-MM-DD",
"block_number": 24500000,
"tvl_eth": "1234567890000000000000",
"tvl_eth_formatted": "1234.57",
"accumulated_fees_eth": "12345678900000000",
"accumulated_fees_eth_formatted": "0.012",
"position_count": 3,
"positions": [
{
"name": "floor",
"tick_lower": -887272,
"tick_upper": -200000,
"liquidity": "987654321000000000"
},
{
"name": "anchor",
"tick_lower": -200000,
"tick_upper": 0
},
{
"name": "discovery",
"tick_lower": 0,
"tick_upper": 887272
}
],
"rebalance_count_24h": 4,
"last_rebalance_block": 24499800,
"staleness_threshold_days": 1,
"verdict": "healthy" | "degraded" | "offline"
}
| Field | Type | Description |
|---|---|---|
date |
string (ISO) | Date of the snapshot |
block_number |
integer | Block number at time of snapshot |
tvl_eth |
string (wei) | Total value locked across all LM positions in wei |
tvl_eth_formatted |
string | TVL formatted in ETH (2 dp) |
accumulated_fees_eth |
string (wei) | Fees accumulated by the LiquidityManager in wei |
accumulated_fees_eth_formatted |
string | Fees formatted in ETH (3 dp) |
position_count |
integer | Number of active Uniswap V3 positions (expected: 3) |
positions |
array | One entry per active position |
positions[].name |
string | Position label: "floor", "anchor", or "discovery" |
positions[].tick_lower |
integer | Lower tick boundary |
positions[].tick_upper |
integer | Upper tick boundary |
positions[].liquidity |
string | Liquidity amount in the position (wei-scale integer) |
rebalance_count_24h |
integer | Number of recenter() calls in the past 24 hours |
last_rebalance_block |
integer | Block number of the most recent recenter() call |
staleness_threshold_days |
integer | Maximum age in days before this record is considered stale (always 1) |
verdict |
string | "healthy" (positions active, TVL > 0), "degraded" (position_count < 3 or rebalance stalled), or "offline" (TVL = 0 or contract unreachable) |