475 lines
18 KiB
Markdown
475 lines
18 KiB
Markdown
# Evidence Directory
|
||
|
||
Machine-readable process results for the KRAIKEN optimizer pipeline. All formulas
|
||
(evolution, red-team, holdout, user-test) write structured JSON here.
|
||
|
||
## Purpose
|
||
|
||
- **Planner input** — the planner reads these files to decide next actions
|
||
(e.g. "last red-team showed IL vulnerability → trigger evolution").
|
||
- **Diffable history** — `git log evidence/` shows how metrics change over time.
|
||
- **Permanent record** — separate from `tmp/` which is ephemeral.
|
||
|
||
## Directory Layout
|
||
|
||
```
|
||
evidence/
|
||
evolution/
|
||
YYYY-MM-DD.json # run params, generation stats, best fitness, champion file
|
||
red-team/
|
||
YYYY-MM-DD.json # per-attack results, floor held/broken, ETH extracted
|
||
holdout/
|
||
YYYY-MM-DD-prNNN.json # per-scenario pass/fail, gate decision
|
||
user-test/
|
||
YYYY-MM-DD.json # per-persona reports, screenshot refs, friction points
|
||
resources/
|
||
YYYY-MM-DD.json # disk, RAM, API call counts, budget burn, CI queue depth
|
||
protocol/
|
||
YYYY-MM-DD.json # TVL, accumulated fees, position count, rebalance frequency
|
||
```
|
||
|
||
## Delivery Pattern
|
||
|
||
Every formula follows the same three-step pattern:
|
||
|
||
1. **Evidence file** → committed to `evidence/` on main
|
||
2. **Git artifacts** (new code, attack vectors, evolved programs) → PR
|
||
3. **Human summary** → issue comment with key metrics + link to evidence file
|
||
|
||
---
|
||
|
||
## Fee-Income Calculation Model
|
||
|
||
This section documents how `delta_bps` values in red-team and holdout evidence files
|
||
are derived, so that recorded values can be independently verified.
|
||
|
||
### Measurement tool
|
||
|
||
`delta_bps` is computed from two snapshots of **LM total ETH** taken by
|
||
[`onchain/script/LmTotalEth.s.sol`](../onchain/script/LmTotalEth.s.sol):
|
||
|
||
```
|
||
lm_total_eth = lm.balance (free ETH)
|
||
+ WETH.balanceOf(lm) (free WETH)
|
||
+ Σ positionEthPrincipal(stage) for stage ∈ {FLOOR, ANCHOR, DISCOVERY}
|
||
```
|
||
|
||
Each position's ETH principal is calculated via `LiquidityAmounts.getAmountsForLiquidity`
|
||
at the pool's current `sqrtPriceX96`. Only the WETH side of each position is summed;
|
||
the KRK side is excluded.
|
||
|
||
### What is and is not counted
|
||
|
||
| Counted | Not counted |
|
||
|---------|-------------|
|
||
| Free native ETH on the LM contract | KRK balance (free or in positions) |
|
||
| Free WETH (ERC-20) on the LM contract | Uncollected fees still inside Uni V3 positions |
|
||
| ETH-side principal of all 3 positions | KRK fees transferred to `feeDestination` |
|
||
|
||
**Key consequence:** Uncollected fees accrued inside Uniswap V3 positions are invisible
|
||
to `LmTotalEth` until a `recenter()` call executes `pool.burn` + `pool.collect`, which
|
||
converts them into free WETH on the LM contract (or transfers them to `feeDestination`).
|
||
A `recenter()` between the two snapshots materializes these fees into the measurement.
|
||
|
||
### `delta_bps` formula
|
||
|
||
```
|
||
delta_bps = (lm_eth_after − lm_eth_before) / lm_eth_before × 10_000
|
||
```
|
||
|
||
Where `lm_eth_before` and `lm_eth_after` are `LmTotalEth` readings taken before and
|
||
after the attack sequence. Each attack is snapshot-isolated (Anvil snapshot → execute →
|
||
measure → revert), so per-attack `delta_bps` values are independent.
|
||
|
||
### Components that drive `delta_bps`
|
||
|
||
A round-trip trade (buy KRK with ETH, then sell KRK back for ETH) through the LM's
|
||
dominant positions produces a positive `delta_bps` from three sources:
|
||
|
||
1. **Pool fee income (1% per leg).** The WETH/KRK pool charges a 1% fee (`FEE = 10_000`
|
||
in `LiquidityManager.sol`). On a simple round trip this contributes ~2% of volume.
|
||
However, fees accrue as uncollected position fees and only become visible after
|
||
`recenter()` materializes them. If no recenter occurs between snapshots, fee income
|
||
is partially hidden (reflected only indirectly through reduced trade output).
|
||
|
||
2. **Concentrated-liquidity slippage.** The LM's three-position strategy concentrates
|
||
most liquidity in narrow tick ranges. Trades that exceed the depth of a position
|
||
range push through progressively thinner liquidity, causing super-linear slippage.
|
||
The attacker receives fewer tokens per unit of input on each marginal unit. This
|
||
slippage transfers value to the LM's positions as increased ETH principal.
|
||
|
||
3. **Recenter repositioning gain.** When `recenter()` is called between trade legs:
|
||
- All three positions are burned and fees collected.
|
||
- New positions are minted at the current price.
|
||
- Any accumulated fees (WETH portion) become free WETH and are redeployed as new
|
||
position liquidity. KRK fees are sent to `feeDestination`.
|
||
- The repositioned liquidity changes the tick ranges the next trade interacts with.
|
||
|
||
### Why `delta_bps` is non-linear
|
||
|
||
A naive estimate of `delta_bps ≈ volume × 1% × 2 legs / lm_eth_before × 10_000`
|
||
underestimates the actual value for large trades because:
|
||
|
||
- **Slippage dominates at high volume.** When trade volume approaches or exceeds the
|
||
ETH depth of the active positions, the price moves through the entire concentrated
|
||
range and into thin or empty ticks. The slippage loss to the attacker (= gain to the
|
||
LM) grows super-linearly with volume.
|
||
- **Multi-recenter compounding.** Strategies that call `recenter()` between sub-trades
|
||
materialize intermediate fees and reposition liquidity at a new price. Subsequent
|
||
trades pay fees at the new tick ranges, compounding the total fee capture.
|
||
- **KRK fee exclusion.** KRK fees collected during `recenter()` are transferred to
|
||
`feeDestination` and excluded from `LmTotalEth`. This means the measurement captures
|
||
the ETH-side gain but not the KRK-side gain — `delta_bps` understates total protocol
|
||
revenue.
|
||
|
||
### Fee destination behaviour
|
||
|
||
When `feeDestination` is `address(0)` or `address(this)` (the LM contract itself),
|
||
fees are **not** transferred out — they remain as deployable liquidity on the LM.
|
||
In this configuration, materialized WETH fees increase `lm_total_eth` directly. When
|
||
`feeDestination` is an external address, WETH fees are transferred out and do **not**
|
||
contribute to `lm_total_eth`. The red-team test environment uses `feeDestination =
|
||
address(this)` so that fee income is fully reflected in `delta_bps`.
|
||
|
||
### Worked example
|
||
|
||
Using attack 2 from `evidence/red-team/2026-03-20.json`:
|
||
|
||
> **"Buy → Recenter → Sell (800 ETH round trip)"** — `delta_bps: 1179`
|
||
|
||
**Given:**
|
||
- `lm_eth_before` = 999,999,999,999,999,999,998 wei ≈ 1000 ETH
|
||
- Trade volume = 800 ETH (buy leg) + equivalent KRK sell leg
|
||
- Pool fee rate = 1% per swap
|
||
- `feeDestination = address(this)` (fees stay in LM)
|
||
|
||
**Step-by-step derivation:**
|
||
|
||
1. **Buy leg (800 ETH → KRK):** The 800 ETH buy pushes the price ~4000 ticks into
|
||
the concentrated positions. The pool charges 1% (≈8 ETH in fees accruing to
|
||
positions). Because liquidity is concentrated, the price moves far — the attacker
|
||
receives significantly fewer KRK than a constant-product AMM would give.
|
||
After the buy, position ETH principal increases (price moved up = more ETH value
|
||
in range).
|
||
|
||
2. **Recenter:** Positions are burned, collecting all accrued fees. New positions are
|
||
minted at the new (higher) price. The ~8 ETH in WETH fees plus the ETH-side
|
||
principal become redeployable liquidity.
|
||
|
||
3. **Sell leg (KRK → ETH):** The attacker sells all acquired KRK back through the
|
||
newly positioned liquidity. Another 1% fee applies. Because the attacker received
|
||
fewer KRK than 800 ETH worth (due to buy-leg slippage), the sell leg returns
|
||
significantly less than 800 ETH. The price drops back but the LM retains the
|
||
slippage differential.
|
||
|
||
4. **Result:** `lm_eth_after ≈ 1000 + 117.9 ≈ 1117.9 ETH`.
|
||
```
|
||
delta_bps = (1117.9 − 1000) / 1000 × 10_000 = 1179 bps
|
||
```
|
||
The ~117.9 ETH gain comes from: 1% fees on both legs (~16 ETH) **plus** ~102 ETH
|
||
in concentrated-liquidity slippage loss by the attacker. The slippage component
|
||
dominates because 800 ETH far exceeds the depth of the anchor/discovery positions,
|
||
pushing the trade through increasingly thin liquidity.
|
||
|
||
**Cross-check — why naive formula fails:**
|
||
```
|
||
naive = 800 × 0.01 × 2 / 1000 × 10_000 = 160 bps (actual: 1179 bps)
|
||
```
|
||
The naive estimate assumes uniform liquidity (constant slippage = fee rate only).
|
||
The 7× difference is entirely due to concentrated-liquidity slippage on a trade that
|
||
exceeds position depth.
|
||
|
||
---
|
||
|
||
## Schema: `evolution/YYYY-MM-DD.json`
|
||
|
||
Records one optimizer evolution run.
|
||
|
||
```json
|
||
{
|
||
"date": "YYYY-MM-DD",
|
||
"run_params": {
|
||
"generations": 50,
|
||
"population_size": 20,
|
||
"seed": 42,
|
||
"base_optimizer": "OptimizerV3"
|
||
},
|
||
"generation_stats": [
|
||
{
|
||
"generation": 1,
|
||
"best_fitness": -12.4,
|
||
"mean_fitness": -34.1,
|
||
"worst_fitness": -91.2
|
||
}
|
||
],
|
||
"best_fitness": -8.7,
|
||
"champion_file": "onchain/src/OptimizerV4.sol",
|
||
"champion_commit": "abc1234",
|
||
"verdict": "improved" | "no_improvement"
|
||
}
|
||
```
|
||
|
||
| Field | Type | Description |
|
||
|-------|------|-------------|
|
||
| `date` | string (ISO) | Date of the run |
|
||
| `run_params` | object | Input parameters used |
|
||
| `generation_stats` | array | Per-generation fitness summary |
|
||
| `best_fitness` | number | Best fitness score achieved (lower = better loss for LM) |
|
||
| `champion_file` | string | Repo-relative path to winning optimizer |
|
||
| `champion_commit` | string | Git commit SHA of the champion (if promoted) |
|
||
| `verdict` | string | `"improved"` or `"no_improvement"` |
|
||
|
||
---
|
||
|
||
## Schema: `red-team/YYYY-MM-DD.json`
|
||
|
||
Records one adversarial red-team run against a candidate optimizer.
|
||
|
||
```json
|
||
{
|
||
"date": "YYYY-MM-DD",
|
||
"candidate": "OptimizerV3",
|
||
"candidate_commit": "abc1234",
|
||
"optimizer_profile": "push3-default",
|
||
"lm_eth_before": 1000000000000000000000,
|
||
"lm_eth_after": 998500000000000000000,
|
||
"eth_extracted": 1500000000000000000,
|
||
"floor_held": false,
|
||
"verdict": "floor_broken" | "floor_held",
|
||
"attacks": [
|
||
{
|
||
"strategy": "Flash buy + stake + recenter loop",
|
||
"pattern": "wrap → buy → stake → recenter_multi → sell",
|
||
"result": "DECREASED" | "HELD" | "INCREASED",
|
||
"delta_bps": -150,
|
||
"insight": "Rapid recenters pack ETH into floor while ratcheting it toward current price"
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
| Field | Type | Description |
|
||
|-------|------|-------------|
|
||
| `date` | string (ISO) | Date of the run |
|
||
| `candidate` | string | Optimizer under test |
|
||
| `candidate_commit` | string | Git commit SHA of the optimizer under test |
|
||
| `optimizer_profile` | string | Named profile / push3 variant |
|
||
| `lm_eth_before` | integer (wei) | LM total ETH at start |
|
||
| `lm_eth_after` | integer (wei) | LM total ETH at end |
|
||
| `eth_extracted` | integer (wei) | `lm_eth_before - lm_eth_after` (0 if floor held) |
|
||
| `floor_held` | boolean | `true` if no ETH was extracted |
|
||
| `verdict` | string | `"floor_held"` or `"floor_broken"` |
|
||
| `attacks[].strategy` | string | Human-readable strategy name |
|
||
| `attacks[].pattern` | string | Abstract op sequence (e.g. `wrap → buy → stake`) |
|
||
| `attacks[].result` | string | `"DECREASED"`, `"HELD"`, or `"INCREASED"` |
|
||
| `attacks[].delta_bps` | integer | LM ETH change in basis points |
|
||
| `attacks[].insight` | string | Key finding from this strategy |
|
||
|
||
---
|
||
|
||
## Schema: `holdout/YYYY-MM-DD-prNNN.json`
|
||
|
||
Records a holdout quality gate evaluation for a specific PR.
|
||
|
||
```json
|
||
{
|
||
"date": "YYYY-MM-DD",
|
||
"pr": 123,
|
||
"candidate_commit": "abc1234",
|
||
"scenarios": [
|
||
{
|
||
"name": "bear_market_crash",
|
||
"passed": true,
|
||
"lm_eth_delta_bps": 12,
|
||
"notes": ""
|
||
},
|
||
{
|
||
"name": "flash_buy_exploit",
|
||
"passed": false,
|
||
"lm_eth_delta_bps": -340,
|
||
"notes": "Floor broken on 2000-trade run"
|
||
}
|
||
],
|
||
"scenarios_passed": 4,
|
||
"scenarios_total": 5,
|
||
"gate_passed": false,
|
||
"verdict": "pass" | "fail",
|
||
"blocking_scenarios": ["flash_buy_exploit"]
|
||
}
|
||
```
|
||
|
||
| Field | Type | Description |
|
||
|-------|------|-------------|
|
||
| `date` | string (ISO) | Date of evaluation |
|
||
| `pr` | integer | PR number being evaluated |
|
||
| `candidate_commit` | string | Commit SHA under test |
|
||
| `scenarios` | array | One entry per holdout scenario |
|
||
| `scenarios[].name` | string | Scenario identifier |
|
||
| `scenarios[].passed` | boolean | Whether LM ETH held or improved |
|
||
| `scenarios[].lm_eth_delta_bps` | integer | LM ETH change in basis points |
|
||
| `scenarios[].notes` | string | Free-text notes on failure mode |
|
||
| `scenarios_passed` | integer | Count of passing scenarios |
|
||
| `scenarios_total` | integer | Total scenarios run |
|
||
| `gate_passed` | boolean | `true` if all required scenarios passed |
|
||
| `verdict` | string | `"pass"` or `"fail"` |
|
||
| `blocking_scenarios` | array of strings | Scenario names that caused failure |
|
||
|
||
---
|
||
|
||
## Schema: `user-test/YYYY-MM-DD.json`
|
||
|
||
Records a UX evaluation run across simulated personas.
|
||
|
||
```json
|
||
{
|
||
"date": "YYYY-MM-DD",
|
||
"personas": [
|
||
{
|
||
"name": "crypto_native",
|
||
"task": "stake_and_set_tax_rate",
|
||
"completed": true,
|
||
"friction_points": [],
|
||
"screenshot_refs": ["tmp/screenshots/crypto_native_stake.png"],
|
||
"notes": ""
|
||
},
|
||
{
|
||
"name": "defi_newcomer",
|
||
"task": "first_buy_and_stake",
|
||
"completed": false,
|
||
"friction_points": ["Tax rate slider label unclear", "No confirmation of stake tx"],
|
||
"screenshot_refs": ["tmp/screenshots/defi_newcomer_confused.png"],
|
||
"notes": "User abandoned at tax rate step"
|
||
}
|
||
],
|
||
"personas_completed": 1,
|
||
"personas_total": 2,
|
||
"critical_friction_points": ["Tax rate slider label unclear"],
|
||
"verdict": "pass" | "fail"
|
||
}
|
||
```
|
||
|
||
| Field | Type | Description |
|
||
|-------|------|-------------|
|
||
| `date` | string (ISO) | Date of evaluation |
|
||
| `personas` | array | One entry per simulated persona |
|
||
| `personas[].name` | string | Persona identifier |
|
||
| `personas[].task` | string | Task the persona attempted |
|
||
| `personas[].completed` | boolean | Whether the task was completed |
|
||
| `personas[].friction_points` | array of strings | UX issues encountered |
|
||
| `personas[].screenshot_refs` | array of strings | Repo-relative paths to screenshots |
|
||
| `personas[].notes` | string | Free-text observations |
|
||
| `personas_completed` | integer | Count of personas who completed their task |
|
||
| `personas_total` | integer | Total personas evaluated |
|
||
| `critical_friction_points` | array of strings | Friction points that blocked task completion |
|
||
| `verdict` | string | `"pass"` if all personas completed, `"fail"` otherwise |
|
||
|
||
---
|
||
|
||
## Schema: `resources/YYYY-MM-DD.json`
|
||
|
||
Records one infrastructure resource snapshot.
|
||
|
||
```json
|
||
{
|
||
"date": "YYYY-MM-DD",
|
||
"disk": {
|
||
"used_bytes": 85899345920,
|
||
"total_bytes": 107374182400,
|
||
"used_pct": 80.0
|
||
},
|
||
"ram": {
|
||
"used_bytes": 3221225472,
|
||
"total_bytes": 8589934592,
|
||
"used_pct": 37.5
|
||
},
|
||
"api": {
|
||
"anthropic_calls_24h": 142,
|
||
"anthropic_budget_usd_used": 4.87,
|
||
"anthropic_budget_usd_limit": 50.0,
|
||
"anthropic_budget_pct": 9.7
|
||
},
|
||
"ci": {
|
||
"woodpecker_queue_depth": 2,
|
||
"woodpecker_running": 1
|
||
},
|
||
"staleness_threshold_days": 1,
|
||
"verdict": "ok" | "warn" | "critical"
|
||
}
|
||
```
|
||
|
||
| Field | Type | Description |
|
||
|-------|------|-------------|
|
||
| `date` | string (ISO) | Date of the snapshot |
|
||
| `disk.used_bytes` | integer | Bytes used on the primary volume |
|
||
| `disk.total_bytes` | integer | Total bytes on the primary volume |
|
||
| `disk.used_pct` | number | Percentage of disk used |
|
||
| `ram.used_bytes` | integer | Bytes of RAM in use |
|
||
| `ram.total_bytes` | integer | Total bytes of RAM |
|
||
| `ram.used_pct` | number | Percentage of RAM used |
|
||
| `api.anthropic_calls_24h` | integer | Anthropic API calls in the past 24 hours |
|
||
| `api.anthropic_budget_usd_used` | number | USD spent against the Anthropic budget |
|
||
| `api.anthropic_budget_usd_limit` | number | Configured Anthropic budget ceiling in USD |
|
||
| `api.anthropic_budget_pct` | number | Percentage of budget consumed |
|
||
| `ci.woodpecker_queue_depth` | integer | Number of jobs waiting in the Woodpecker CI queue |
|
||
| `ci.woodpecker_running` | integer | Number of Woodpecker jobs currently running |
|
||
| `staleness_threshold_days` | integer | Maximum age in days before this record is considered stale (always 1) |
|
||
| `verdict` | string | `"ok"` (all metrics normal), `"warn"` (≥80% on any dimension), or `"critical"` (≥95% on any dimension) |
|
||
|
||
---
|
||
|
||
## Schema: `protocol/YYYY-MM-DD.json`
|
||
|
||
Records one on-chain protocol health snapshot.
|
||
|
||
```json
|
||
{
|
||
"date": "YYYY-MM-DD",
|
||
"block_number": 24500000,
|
||
"tvl_eth": "1234567890000000000000",
|
||
"tvl_eth_formatted": "1234.57",
|
||
"accumulated_fees_eth": "12345678900000000",
|
||
"accumulated_fees_eth_formatted": "0.012",
|
||
"position_count": 3,
|
||
"positions": [
|
||
{
|
||
"name": "floor",
|
||
"tick_lower": -887272,
|
||
"tick_upper": -200000,
|
||
"liquidity": "987654321000000000"
|
||
},
|
||
{
|
||
"name": "anchor",
|
||
"tick_lower": -200000,
|
||
"tick_upper": 0
|
||
},
|
||
{
|
||
"name": "discovery",
|
||
"tick_lower": 0,
|
||
"tick_upper": 887272
|
||
}
|
||
],
|
||
"rebalance_count_24h": 4,
|
||
"last_rebalance_block": 24499800,
|
||
"staleness_threshold_days": 1,
|
||
"verdict": "healthy" | "degraded" | "offline"
|
||
}
|
||
```
|
||
|
||
| Field | Type | Description |
|
||
|-------|------|-------------|
|
||
| `date` | string (ISO) | Date of the snapshot |
|
||
| `block_number` | integer | Block number at time of snapshot |
|
||
| `tvl_eth` | string (wei) | Total value locked across all LM positions in wei |
|
||
| `tvl_eth_formatted` | string | TVL formatted in ETH (2 dp) |
|
||
| `accumulated_fees_eth` | string (wei) | Fees accumulated by the LiquidityManager in wei |
|
||
| `accumulated_fees_eth_formatted` | string | Fees formatted in ETH (3 dp) |
|
||
| `position_count` | integer | Number of active Uniswap V3 positions (expected: 3) |
|
||
| `positions` | array | One entry per active position |
|
||
| `positions[].name` | string | Position label: `"floor"`, `"anchor"`, or `"discovery"` |
|
||
| `positions[].tick_lower` | integer | Lower tick boundary |
|
||
| `positions[].tick_upper` | integer | Upper tick boundary |
|
||
| `positions[].liquidity` | string | Liquidity amount in the position (wei-scale integer) |
|
||
| `rebalance_count_24h` | integer | Number of `recenter()` calls in the past 24 hours |
|
||
| `last_rebalance_block` | integer | Block number of the most recent `recenter()` call |
|
||
| `staleness_threshold_days` | integer | Maximum age in days before this record is considered stale (always 1) |
|
||
| `verdict` | string | `"healthy"` (positions active, TVL > 0), `"degraded"` (position_count < 3 or rebalance stalled), or `"offline"` (TVL = 0 or contract unreachable) |
|