harb/evidence/README.md
johba 9d11c848e9 fix: correct worked example attack index reference (attacks[1], not attack 2)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 04:04:40 +00:00

475 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Evidence Directory
Machine-readable process results for the KRAIKEN optimizer pipeline. All formulas
(evolution, red-team, holdout, user-test) write structured JSON here.
## Purpose
- **Planner input** — the planner reads these files to decide next actions
(e.g. "last red-team showed IL vulnerability → trigger evolution").
- **Diffable history** — `git log evidence/` shows how metrics change over time.
- **Permanent record** — separate from `tmp/` which is ephemeral.
## Directory Layout
```
evidence/
evolution/
YYYY-MM-DD.json # run params, generation stats, best fitness, champion file
red-team/
YYYY-MM-DD.json # per-attack results, floor held/broken, ETH extracted
holdout/
YYYY-MM-DD-prNNN.json # per-scenario pass/fail, gate decision
user-test/
YYYY-MM-DD.json # per-persona reports, screenshot refs, friction points
resources/
YYYY-MM-DD.json # disk, RAM, API call counts, budget burn, CI queue depth
protocol/
YYYY-MM-DD.json # TVL, accumulated fees, position count, rebalance frequency
```
## Delivery Pattern
Every formula follows the same three-step pattern:
1. **Evidence file** → committed to `evidence/` on main
2. **Git artifacts** (new code, attack vectors, evolved programs) → PR
3. **Human summary** → issue comment with key metrics + link to evidence file
---
## Fee-Income Calculation Model
This section documents how `delta_bps` values in red-team and holdout evidence files
are derived, so that recorded values can be independently verified.
### Measurement tool
`delta_bps` is computed from two snapshots of **LM total ETH** taken by
[`onchain/script/LmTotalEth.s.sol`](../onchain/script/LmTotalEth.s.sol):
```
lm_total_eth = lm.balance (free ETH)
+ WETH.balanceOf(lm) (free WETH)
+ Σ positionEthPrincipal(stage) for stage ∈ {FLOOR, ANCHOR, DISCOVERY}
```
Each position's ETH principal is calculated via `LiquidityAmounts.getAmountsForLiquidity`
at the pool's current `sqrtPriceX96`. Only the WETH side of each position is summed;
the KRK side is excluded.
### What is and is not counted
| Counted | Not counted |
|---------|-------------|
| Free native ETH on the LM contract | KRK balance (free or in positions) |
| Free WETH (ERC-20) on the LM contract | Uncollected fees still inside Uni V3 positions |
| ETH-side principal of all 3 positions | KRK fees transferred to `feeDestination` |
**Key consequence:** Uncollected fees accrued inside Uniswap V3 positions are invisible
to `LmTotalEth` until a `recenter()` call executes `pool.burn` + `pool.collect`, which
converts them into free WETH on the LM contract (or transfers them to `feeDestination`).
A `recenter()` between the two snapshots materializes these fees into the measurement.
### `delta_bps` formula
```
delta_bps = (lm_eth_after lm_eth_before) / lm_eth_before × 10_000
```
Where `lm_eth_before` and `lm_eth_after` are `LmTotalEth` readings taken before and
after the attack sequence. Each attack is snapshot-isolated (Anvil snapshot → execute →
measure → revert), so per-attack `delta_bps` values are independent.
### Components that drive `delta_bps`
A round-trip trade (buy KRK with ETH, then sell KRK back for ETH) through the LM's
dominant positions produces a positive `delta_bps` from three sources:
1. **Pool fee income (1% per leg).** The WETH/KRK pool charges a 1% fee (`FEE = 10_000`
in `LiquidityManager.sol`). On a simple round trip this contributes ~2% of volume.
However, fees accrue as uncollected position fees and only become visible after
`recenter()` materializes them. If no recenter occurs between snapshots, fee income
is partially hidden (reflected only indirectly through reduced trade output).
2. **Concentrated-liquidity slippage.** The LM's three-position strategy concentrates
most liquidity in narrow tick ranges. Trades that exceed the depth of a position
range push through progressively thinner liquidity, causing super-linear slippage.
The attacker receives fewer tokens per unit of input on each marginal unit. This
slippage transfers value to the LM's positions as increased ETH principal.
3. **Recenter repositioning gain.** When `recenter()` is called between trade legs:
- All three positions are burned and fees collected.
- New positions are minted at the current price.
- Any accumulated fees (WETH portion) become free WETH and are redeployed as new
position liquidity. KRK fees are sent to `feeDestination`.
- The repositioned liquidity changes the tick ranges the next trade interacts with.
### Why `delta_bps` is non-linear
A naive estimate of `delta_bps ≈ volume × 1% × 2 legs / lm_eth_before × 10_000`
underestimates the actual value for large trades because:
- **Slippage dominates at high volume.** When trade volume approaches or exceeds the
ETH depth of the active positions, the price moves through the entire concentrated
range and into thin or empty ticks. The slippage loss to the attacker (= gain to the
LM) grows super-linearly with volume.
- **Multi-recenter compounding.** Strategies that call `recenter()` between sub-trades
materialize intermediate fees and reposition liquidity at a new price. Subsequent
trades pay fees at the new tick ranges, compounding the total fee capture.
- **KRK fee exclusion.** KRK fees collected during `recenter()` are transferred to
`feeDestination` and excluded from `LmTotalEth`. This means the measurement captures
the ETH-side gain but not the KRK-side gain — `delta_bps` understates total protocol
revenue.
### Fee destination behaviour
When `feeDestination` is `address(0)` or `address(this)` (the LM contract itself),
fees are **not** transferred out — they remain as deployable liquidity on the LM.
In this configuration, materialized WETH fees increase `lm_total_eth` directly. When
`feeDestination` is an external address, WETH fees are transferred out and do **not**
contribute to `lm_total_eth`. The red-team test environment uses `feeDestination =
address(this)` so that fee income is fully reflected in `delta_bps`.
### Worked example
Using `attacks[1]` from `evidence/red-team/2026-03-20.json`:
> **"Buy → Recenter → Sell (800 ETH round trip)"** — `delta_bps: 1179`
**Given:**
- `lm_eth_before` = 999,999,999,999,999,999,998 wei ≈ 1000 ETH
- Trade volume = 800 ETH (buy leg) + equivalent KRK sell leg
- Pool fee rate = 1% per swap
- `feeDestination = address(this)` (fees stay in LM)
**Step-by-step derivation:**
1. **Buy leg (800 ETH → KRK):** The 800 ETH buy pushes the price ~4000 ticks into
the concentrated positions. The pool charges 1% (≈8 ETH in fees accruing to
positions). Because liquidity is concentrated, the price moves far — the attacker
receives significantly fewer KRK than a constant-product AMM would give.
After the buy, position ETH principal increases (price moved up = more ETH value
in range).
2. **Recenter:** Positions are burned, collecting all accrued fees. New positions are
minted at the new (higher) price. The ~8 ETH in WETH fees plus the ETH-side
principal become redeployable liquidity.
3. **Sell leg (KRK → ETH):** The attacker sells all acquired KRK back through the
newly positioned liquidity. Another 1% fee applies. Because the attacker received
fewer KRK than 800 ETH worth (due to buy-leg slippage), the sell leg returns
significantly less than 800 ETH. The price drops back but the LM retains the
slippage differential.
4. **Result:** `lm_eth_after ≈ 1000 + 117.9 ≈ 1117.9 ETH`.
```
delta_bps = (1117.9 1000) / 1000 × 10_000 = 1179 bps
```
The ~117.9 ETH gain comes from: 1% fees on both legs (~16 ETH) **plus** ~102 ETH
in concentrated-liquidity slippage loss by the attacker. The slippage component
dominates because 800 ETH far exceeds the depth of the anchor/discovery positions,
pushing the trade through increasingly thin liquidity.
**Cross-check — why naive formula fails:**
```
naive = 800 × 0.01 × 2 / 1000 × 10_000 = 160 bps (actual: 1179 bps)
```
The naive estimate assumes uniform liquidity (constant slippage = fee rate only).
The 7× difference is entirely due to concentrated-liquidity slippage on a trade that
exceeds position depth.
---
## Schema: `evolution/YYYY-MM-DD.json`
Records one optimizer evolution run.
```json
{
"date": "YYYY-MM-DD",
"run_params": {
"generations": 50,
"population_size": 20,
"seed": 42,
"base_optimizer": "OptimizerV3"
},
"generation_stats": [
{
"generation": 1,
"best_fitness": -12.4,
"mean_fitness": -34.1,
"worst_fitness": -91.2
}
],
"best_fitness": -8.7,
"champion_file": "onchain/src/OptimizerV4.sol",
"champion_commit": "abc1234",
"verdict": "improved" | "no_improvement"
}
```
| Field | Type | Description |
|-------|------|-------------|
| `date` | string (ISO) | Date of the run |
| `run_params` | object | Input parameters used |
| `generation_stats` | array | Per-generation fitness summary |
| `best_fitness` | number | Best fitness score achieved (lower = better loss for LM) |
| `champion_file` | string | Repo-relative path to winning optimizer |
| `champion_commit` | string | Git commit SHA of the champion (if promoted) |
| `verdict` | string | `"improved"` or `"no_improvement"` |
---
## Schema: `red-team/YYYY-MM-DD.json`
Records one adversarial red-team run against a candidate optimizer.
```json
{
"date": "YYYY-MM-DD",
"candidate": "OptimizerV3",
"candidate_commit": "abc1234",
"optimizer_profile": "push3-default",
"lm_eth_before": 1000000000000000000000,
"lm_eth_after": 998500000000000000000,
"eth_extracted": 1500000000000000000,
"floor_held": false,
"verdict": "floor_broken" | "floor_held",
"attacks": [
{
"strategy": "Flash buy + stake + recenter loop",
"pattern": "wrap → buy → stake → recenter_multi → sell",
"result": "DECREASED" | "HELD" | "INCREASED",
"delta_bps": -150,
"insight": "Rapid recenters pack ETH into floor while ratcheting it toward current price"
}
]
}
```
| Field | Type | Description |
|-------|------|-------------|
| `date` | string (ISO) | Date of the run |
| `candidate` | string | Optimizer under test |
| `candidate_commit` | string | Git commit SHA of the optimizer under test |
| `optimizer_profile` | string | Named profile / push3 variant |
| `lm_eth_before` | integer (wei) | LM total ETH at start |
| `lm_eth_after` | integer (wei) | LM total ETH at end |
| `eth_extracted` | integer (wei) | `lm_eth_before - lm_eth_after` (0 if floor held) |
| `floor_held` | boolean | `true` if no ETH was extracted |
| `verdict` | string | `"floor_held"` or `"floor_broken"` |
| `attacks[].strategy` | string | Human-readable strategy name |
| `attacks[].pattern` | string | Abstract op sequence (e.g. `wrap → buy → stake`) |
| `attacks[].result` | string | `"DECREASED"`, `"HELD"`, or `"INCREASED"` |
| `attacks[].delta_bps` | integer | LM ETH change in basis points |
| `attacks[].insight` | string | Key finding from this strategy |
---
## Schema: `holdout/YYYY-MM-DD-prNNN.json`
Records a holdout quality gate evaluation for a specific PR.
```json
{
"date": "YYYY-MM-DD",
"pr": 123,
"candidate_commit": "abc1234",
"scenarios": [
{
"name": "bear_market_crash",
"passed": true,
"lm_eth_delta_bps": 12,
"notes": ""
},
{
"name": "flash_buy_exploit",
"passed": false,
"lm_eth_delta_bps": -340,
"notes": "Floor broken on 2000-trade run"
}
],
"scenarios_passed": 4,
"scenarios_total": 5,
"gate_passed": false,
"verdict": "pass" | "fail",
"blocking_scenarios": ["flash_buy_exploit"]
}
```
| Field | Type | Description |
|-------|------|-------------|
| `date` | string (ISO) | Date of evaluation |
| `pr` | integer | PR number being evaluated |
| `candidate_commit` | string | Commit SHA under test |
| `scenarios` | array | One entry per holdout scenario |
| `scenarios[].name` | string | Scenario identifier |
| `scenarios[].passed` | boolean | Whether LM ETH held or improved |
| `scenarios[].lm_eth_delta_bps` | integer | LM ETH change in basis points |
| `scenarios[].notes` | string | Free-text notes on failure mode |
| `scenarios_passed` | integer | Count of passing scenarios |
| `scenarios_total` | integer | Total scenarios run |
| `gate_passed` | boolean | `true` if all required scenarios passed |
| `verdict` | string | `"pass"` or `"fail"` |
| `blocking_scenarios` | array of strings | Scenario names that caused failure |
---
## Schema: `user-test/YYYY-MM-DD.json`
Records a UX evaluation run across simulated personas.
```json
{
"date": "YYYY-MM-DD",
"personas": [
{
"name": "crypto_native",
"task": "stake_and_set_tax_rate",
"completed": true,
"friction_points": [],
"screenshot_refs": ["tmp/screenshots/crypto_native_stake.png"],
"notes": ""
},
{
"name": "defi_newcomer",
"task": "first_buy_and_stake",
"completed": false,
"friction_points": ["Tax rate slider label unclear", "No confirmation of stake tx"],
"screenshot_refs": ["tmp/screenshots/defi_newcomer_confused.png"],
"notes": "User abandoned at tax rate step"
}
],
"personas_completed": 1,
"personas_total": 2,
"critical_friction_points": ["Tax rate slider label unclear"],
"verdict": "pass" | "fail"
}
```
| Field | Type | Description |
|-------|------|-------------|
| `date` | string (ISO) | Date of evaluation |
| `personas` | array | One entry per simulated persona |
| `personas[].name` | string | Persona identifier |
| `personas[].task` | string | Task the persona attempted |
| `personas[].completed` | boolean | Whether the task was completed |
| `personas[].friction_points` | array of strings | UX issues encountered |
| `personas[].screenshot_refs` | array of strings | Repo-relative paths to screenshots |
| `personas[].notes` | string | Free-text observations |
| `personas_completed` | integer | Count of personas who completed their task |
| `personas_total` | integer | Total personas evaluated |
| `critical_friction_points` | array of strings | Friction points that blocked task completion |
| `verdict` | string | `"pass"` if all personas completed, `"fail"` otherwise |
---
## Schema: `resources/YYYY-MM-DD.json`
Records one infrastructure resource snapshot.
```json
{
"date": "YYYY-MM-DD",
"disk": {
"used_bytes": 85899345920,
"total_bytes": 107374182400,
"used_pct": 80.0
},
"ram": {
"used_bytes": 3221225472,
"total_bytes": 8589934592,
"used_pct": 37.5
},
"api": {
"anthropic_calls_24h": 142,
"anthropic_budget_usd_used": 4.87,
"anthropic_budget_usd_limit": 50.0,
"anthropic_budget_pct": 9.7
},
"ci": {
"woodpecker_queue_depth": 2,
"woodpecker_running": 1
},
"staleness_threshold_days": 1,
"verdict": "ok" | "warn" | "critical"
}
```
| Field | Type | Description |
|-------|------|-------------|
| `date` | string (ISO) | Date of the snapshot |
| `disk.used_bytes` | integer | Bytes used on the primary volume |
| `disk.total_bytes` | integer | Total bytes on the primary volume |
| `disk.used_pct` | number | Percentage of disk used |
| `ram.used_bytes` | integer | Bytes of RAM in use |
| `ram.total_bytes` | integer | Total bytes of RAM |
| `ram.used_pct` | number | Percentage of RAM used |
| `api.anthropic_calls_24h` | integer | Anthropic API calls in the past 24 hours |
| `api.anthropic_budget_usd_used` | number | USD spent against the Anthropic budget |
| `api.anthropic_budget_usd_limit` | number | Configured Anthropic budget ceiling in USD |
| `api.anthropic_budget_pct` | number | Percentage of budget consumed |
| `ci.woodpecker_queue_depth` | integer | Number of jobs waiting in the Woodpecker CI queue |
| `ci.woodpecker_running` | integer | Number of Woodpecker jobs currently running |
| `staleness_threshold_days` | integer | Maximum age in days before this record is considered stale (always 1) |
| `verdict` | string | `"ok"` (all metrics normal), `"warn"` (≥80% on any dimension), or `"critical"` (≥95% on any dimension) |
---
## Schema: `protocol/YYYY-MM-DD.json`
Records one on-chain protocol health snapshot.
```json
{
"date": "YYYY-MM-DD",
"block_number": 24500000,
"tvl_eth": "1234567890000000000000",
"tvl_eth_formatted": "1234.57",
"accumulated_fees_eth": "12345678900000000",
"accumulated_fees_eth_formatted": "0.012",
"position_count": 3,
"positions": [
{
"name": "floor",
"tick_lower": -887272,
"tick_upper": -200000,
"liquidity": "987654321000000000"
},
{
"name": "anchor",
"tick_lower": -200000,
"tick_upper": 0
},
{
"name": "discovery",
"tick_lower": 0,
"tick_upper": 887272
}
],
"rebalance_count_24h": 4,
"last_rebalance_block": 24499800,
"staleness_threshold_days": 1,
"verdict": "healthy" | "degraded" | "offline"
}
```
| Field | Type | Description |
|-------|------|-------------|
| `date` | string (ISO) | Date of the snapshot |
| `block_number` | integer | Block number at time of snapshot |
| `tvl_eth` | string (wei) | Total value locked across all LM positions in wei |
| `tvl_eth_formatted` | string | TVL formatted in ETH (2 dp) |
| `accumulated_fees_eth` | string (wei) | Fees accumulated by the LiquidityManager in wei |
| `accumulated_fees_eth_formatted` | string | Fees formatted in ETH (3 dp) |
| `position_count` | integer | Number of active Uniswap V3 positions (expected: 3) |
| `positions` | array | One entry per active position |
| `positions[].name` | string | Position label: `"floor"`, `"anchor"`, or `"discovery"` |
| `positions[].tick_lower` | integer | Lower tick boundary |
| `positions[].tick_upper` | integer | Upper tick boundary |
| `positions[].liquidity` | string | Liquidity amount in the position (wei-scale integer) |
| `rebalance_count_24h` | integer | Number of `recenter()` calls in the past 24 hours |
| `last_rebalance_block` | integer | Block number of the most recent `recenter()` call |
| `staleness_threshold_days` | integer | Maximum age in days before this record is considered stale (always 1) |
| `verdict` | string | `"healthy"` (positions active, TVL > 0), `"degraded"` (position_count < 3 or rebalance stalled), or `"offline"` (TVL = 0 or contract unreachable) |