# Evidence Directory Machine-readable process results for the KRAIKEN optimizer pipeline. All formulas (evolution, red-team, holdout, user-test) write structured JSON here. ## Purpose - **Planner input** — the planner reads these files to decide next actions (e.g. "last red-team showed IL vulnerability → trigger evolution"). - **Diffable history** — `git log evidence/` shows how metrics change over time. - **Permanent record** — separate from `tmp/` which is ephemeral. ## Directory Layout ``` evidence/ evolution/ YYYY-MM-DD.json # run params, generation stats, best fitness, champion file red-team/ YYYY-MM-DD.json # per-attack results, floor held/broken, ETH extracted holdout/ YYYY-MM-DD-prNNN.json # per-scenario pass/fail, gate decision user-test/ YYYY-MM-DD.json # per-persona reports, screenshot refs, friction points resources/ YYYY-MM-DD.json # disk, RAM, API call counts, budget burn, CI queue depth protocol/ YYYY-MM-DD.json # TVL, accumulated fees, position count, rebalance frequency ``` ## Delivery Pattern Every formula follows the same three-step pattern: 1. **Evidence file** → committed to `evidence/` on main 2. **Git artifacts** (new code, attack vectors, evolved programs) → PR 3. **Human summary** → issue comment with key metrics + link to evidence file --- ## Schema: `evolution/YYYY-MM-DD.json` Records one optimizer evolution run. ```json { "date": "YYYY-MM-DD", "run_params": { "generations": 50, "population_size": 20, "seed": 42, "base_optimizer": "OptimizerV3" }, "generation_stats": [ { "generation": 1, "best_fitness": -12.4, "mean_fitness": -34.1, "worst_fitness": -91.2 } ], "best_fitness": -8.7, "champion_file": "onchain/src/OptimizerV4.sol", "champion_commit": "abc1234", "verdict": "improved" | "no_improvement" } ``` | Field | Type | Description | |-------|------|-------------| | `date` | string (ISO) | Date of the run | | `run_params` | object | Input parameters used | | `generation_stats` | array | Per-generation fitness summary | | `best_fitness` | number | Best fitness score achieved (lower = better loss for LM) | | `champion_file` | string | Repo-relative path to winning optimizer | | `champion_commit` | string | Git commit SHA of the champion (if promoted) | | `verdict` | string | `"improved"` or `"no_improvement"` | --- ## Schema: `red-team/YYYY-MM-DD.json` Records one adversarial red-team run against a candidate optimizer. ```json { "date": "YYYY-MM-DD", "candidate": "OptimizerV3", "candidate_commit": "abc1234", "optimizer_profile": "push3-default", "lm_eth_before": 1000000000000000000000, "lm_eth_after": 998500000000000000000, "eth_extracted": 1500000000000000000, "floor_held": false, "verdict": "floor_broken" | "floor_held", "attacks": [ { "strategy": "Flash buy + stake + recenter loop", "pattern": "wrap → buy → stake → recenter_multi → sell", "result": "DECREASED" | "HELD" | "INCREASED", "delta_bps": -150, "insight": "Rapid recenters pack ETH into floor while ratcheting it toward current price" } ] } ``` | Field | Type | Description | |-------|------|-------------| | `date` | string (ISO) | Date of the run | | `candidate` | string | Optimizer under test | | `candidate_commit` | string | Git commit SHA of the optimizer under test | | `optimizer_profile` | string | Named profile / push3 variant | | `lm_eth_before` | integer (wei) | LM total ETH at start | | `lm_eth_after` | integer (wei) | LM total ETH at end | | `eth_extracted` | integer (wei) | `lm_eth_before - lm_eth_after` (0 if floor held) | | `floor_held` | boolean | `true` if no ETH was extracted | | `verdict` | string | `"floor_held"` or `"floor_broken"` | | `attacks[].strategy` | string | Human-readable strategy name | | `attacks[].pattern` | string | Abstract op sequence (e.g. `wrap → buy → stake`) | | `attacks[].result` | string | `"DECREASED"`, `"HELD"`, or `"INCREASED"` | | `attacks[].delta_bps` | integer | LM ETH change in basis points | | `attacks[].insight` | string | Key finding from this strategy | --- ## Schema: `holdout/YYYY-MM-DD-prNNN.json` Records a holdout quality gate evaluation for a specific PR. ```json { "date": "YYYY-MM-DD", "pr": 123, "candidate_commit": "abc1234", "scenarios": [ { "name": "bear_market_crash", "passed": true, "lm_eth_delta_bps": 12, "notes": "" }, { "name": "flash_buy_exploit", "passed": false, "lm_eth_delta_bps": -340, "notes": "Floor broken on 2000-trade run" } ], "scenarios_passed": 4, "scenarios_total": 5, "gate_passed": false, "verdict": "pass" | "fail", "blocking_scenarios": ["flash_buy_exploit"] } ``` | Field | Type | Description | |-------|------|-------------| | `date` | string (ISO) | Date of evaluation | | `pr` | integer | PR number being evaluated | | `candidate_commit` | string | Commit SHA under test | | `scenarios` | array | One entry per holdout scenario | | `scenarios[].name` | string | Scenario identifier | | `scenarios[].passed` | boolean | Whether LM ETH held or improved | | `scenarios[].lm_eth_delta_bps` | integer | LM ETH change in basis points | | `scenarios[].notes` | string | Free-text notes on failure mode | | `scenarios_passed` | integer | Count of passing scenarios | | `scenarios_total` | integer | Total scenarios run | | `gate_passed` | boolean | `true` if all required scenarios passed | | `verdict` | string | `"pass"` or `"fail"` | | `blocking_scenarios` | array of strings | Scenario names that caused failure | --- ## Schema: `user-test/YYYY-MM-DD.json` Records a UX evaluation run across simulated personas. ```json { "date": "YYYY-MM-DD", "personas": [ { "name": "crypto_native", "task": "stake_and_set_tax_rate", "completed": true, "friction_points": [], "screenshot_refs": ["tmp/screenshots/crypto_native_stake.png"], "notes": "" }, { "name": "defi_newcomer", "task": "first_buy_and_stake", "completed": false, "friction_points": ["Tax rate slider label unclear", "No confirmation of stake tx"], "screenshot_refs": ["tmp/screenshots/defi_newcomer_confused.png"], "notes": "User abandoned at tax rate step" } ], "personas_completed": 1, "personas_total": 2, "critical_friction_points": ["Tax rate slider label unclear"], "verdict": "pass" | "fail" } ``` | Field | Type | Description | |-------|------|-------------| | `date` | string (ISO) | Date of evaluation | | `personas` | array | One entry per simulated persona | | `personas[].name` | string | Persona identifier | | `personas[].task` | string | Task the persona attempted | | `personas[].completed` | boolean | Whether the task was completed | | `personas[].friction_points` | array of strings | UX issues encountered | | `personas[].screenshot_refs` | array of strings | Repo-relative paths to screenshots | | `personas[].notes` | string | Free-text observations | | `personas_completed` | integer | Count of personas who completed their task | | `personas_total` | integer | Total personas evaluated | | `critical_friction_points` | array of strings | Friction points that blocked task completion | | `verdict` | string | `"pass"` if all personas completed, `"fail"` otherwise | --- ## Schema: `resources/YYYY-MM-DD.json` Records one infrastructure resource snapshot. ```json { "date": "YYYY-MM-DD", "disk": { "used_bytes": 85899345920, "total_bytes": 107374182400, "used_pct": 80.0 }, "ram": { "used_bytes": 3221225472, "total_bytes": 8589934592, "used_pct": 37.5 }, "api": { "anthropic_calls_24h": 142, "anthropic_budget_usd_used": 4.87, "anthropic_budget_usd_limit": 50.0, "anthropic_budget_pct": 9.7 }, "ci": { "woodpecker_queue_depth": 2, "woodpecker_running": 1 }, "staleness_threshold_days": 1, "verdict": "ok" | "warn" | "critical" } ``` | Field | Type | Description | |-------|------|-------------| | `date` | string (ISO) | Date of the snapshot | | `disk.used_bytes` | integer | Bytes used on the primary volume | | `disk.total_bytes` | integer | Total bytes on the primary volume | | `disk.used_pct` | number | Percentage of disk used | | `ram.used_bytes` | integer | Bytes of RAM in use | | `ram.total_bytes` | integer | Total bytes of RAM | | `ram.used_pct` | number | Percentage of RAM used | | `api.anthropic_calls_24h` | integer | Anthropic API calls in the past 24 hours | | `api.anthropic_budget_usd_used` | number | USD spent against the Anthropic budget | | `api.anthropic_budget_usd_limit` | number | Configured Anthropic budget ceiling in USD | | `api.anthropic_budget_pct` | number | Percentage of budget consumed | | `ci.woodpecker_queue_depth` | integer | Number of jobs waiting in the Woodpecker CI queue | | `ci.woodpecker_running` | integer | Number of Woodpecker jobs currently running | | `staleness_threshold_days` | integer | Maximum age in days before this record is considered stale (always 1) | | `verdict` | string | `"ok"` (all metrics normal), `"warn"` (≥80% on any dimension), or `"critical"` (≥95% on any dimension) | --- ## Schema: `protocol/YYYY-MM-DD.json` Records one on-chain protocol health snapshot. ```json { "date": "YYYY-MM-DD", "block_number": 24500000, "tvl_eth": "1234567890000000000000", "tvl_eth_formatted": "1234.57", "accumulated_fees_eth": "12345678900000000", "accumulated_fees_eth_formatted": "0.012", "position_count": 3, "positions": [ { "name": "floor", "tick_lower": -887272, "tick_upper": -200000, "liquidity": "987654321000000000" }, { "name": "anchor", "tick_lower": -200000, "tick_upper": 0 }, { "name": "discovery", "tick_lower": 0, "tick_upper": 887272 } ], "rebalance_count_24h": 4, "last_rebalance_block": 24499800, "staleness_threshold_days": 1, "verdict": "healthy" | "degraded" | "offline" } ``` | Field | Type | Description | |-------|------|-------------| | `date` | string (ISO) | Date of the snapshot | | `block_number` | integer | Block number at time of snapshot | | `tvl_eth` | string (wei) | Total value locked across all LM positions in wei | | `tvl_eth_formatted` | string | TVL formatted in ETH (2 dp) | | `accumulated_fees_eth` | string (wei) | Fees accumulated by the LiquidityManager in wei | | `accumulated_fees_eth_formatted` | string | Fees formatted in ETH (3 dp) | | `position_count` | integer | Number of active Uniswap V3 positions (expected: 3) | | `positions` | array | One entry per active position | | `positions[].name` | string | Position label: `"floor"`, `"anchor"`, or `"discovery"` | | `positions[].tick_lower` | integer | Lower tick boundary | | `positions[].tick_upper` | integer | Upper tick boundary | | `positions[].liquidity` | string | Liquidity amount in the position (wei-scale integer) | | `rebalance_count_24h` | integer | Number of `recenter()` calls in the past 24 hours | | `last_rebalance_block` | integer | Block number of the most recent `recenter()` call | | `staleness_threshold_days` | integer | Maximum age in days before this record is considered stale (always 1) | | `verdict` | string | `"healthy"` (positions active, TVL > 0), `"degraded"` (position_count < 3 or rebalance stalled), or `"offline"` (TVL = 0 or contract unreachable) |