- Add evidence/resources/ and evidence/protocol/ directories with .gitkeep - Add schemas for resources/ and protocol/ to evidence/README.md - Create formulas/run-resources.toml (sense formula: disk/RAM/API/CI metrics, daily cron 06:00 UTC, verdict: ok/warn/critical) - Create formulas/run-protocol.toml (sense formula: TVL/fees/positions/ rebalance frequency via LmTotalEth.s.sol + cast, daily cron 07:00 UTC, verdict: healthy/degraded/offline) - Update STATE.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
332 lines
11 KiB
Markdown
332 lines
11 KiB
Markdown
# Evidence Directory
|
|
|
|
Machine-readable process results for the KRAIKEN optimizer pipeline. All formulas
|
|
(evolution, red-team, holdout, user-test) write structured JSON here.
|
|
|
|
## Purpose
|
|
|
|
- **Planner input** — the planner reads these files to decide next actions
|
|
(e.g. "last red-team showed IL vulnerability → trigger evolution").
|
|
- **Diffable history** — `git log evidence/` shows how metrics change over time.
|
|
- **Permanent record** — separate from `tmp/` which is ephemeral.
|
|
|
|
## Directory Layout
|
|
|
|
```
|
|
evidence/
|
|
evolution/
|
|
YYYY-MM-DD.json # run params, generation stats, best fitness, champion file
|
|
red-team/
|
|
YYYY-MM-DD.json # per-attack results, floor held/broken, ETH extracted
|
|
holdout/
|
|
YYYY-MM-DD-prNNN.json # per-scenario pass/fail, gate decision
|
|
user-test/
|
|
YYYY-MM-DD.json # per-persona reports, screenshot refs, friction points
|
|
resources/
|
|
YYYY-MM-DD.json # disk, RAM, API call counts, budget burn, CI queue depth
|
|
protocol/
|
|
YYYY-MM-DD.json # TVL, accumulated fees, position count, rebalance frequency
|
|
```
|
|
|
|
## Delivery Pattern
|
|
|
|
Every formula follows the same three-step pattern:
|
|
|
|
1. **Evidence file** → committed to `evidence/` on main
|
|
2. **Git artifacts** (new code, attack vectors, evolved programs) → PR
|
|
3. **Human summary** → issue comment with key metrics + link to evidence file
|
|
|
|
---
|
|
|
|
## Schema: `evolution/YYYY-MM-DD.json`
|
|
|
|
Records one optimizer evolution run.
|
|
|
|
```json
|
|
{
|
|
"date": "YYYY-MM-DD",
|
|
"run_params": {
|
|
"generations": 50,
|
|
"population_size": 20,
|
|
"seed": 42,
|
|
"base_optimizer": "OptimizerV3"
|
|
},
|
|
"generation_stats": [
|
|
{
|
|
"generation": 1,
|
|
"best_fitness": -12.4,
|
|
"mean_fitness": -34.1,
|
|
"worst_fitness": -91.2
|
|
}
|
|
],
|
|
"best_fitness": -8.7,
|
|
"champion_file": "onchain/src/OptimizerV4.sol",
|
|
"champion_commit": "abc1234",
|
|
"verdict": "improved" | "no_improvement"
|
|
}
|
|
```
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `date` | string (ISO) | Date of the run |
|
|
| `run_params` | object | Input parameters used |
|
|
| `generation_stats` | array | Per-generation fitness summary |
|
|
| `best_fitness` | number | Best fitness score achieved (lower = better loss for LM) |
|
|
| `champion_file` | string | Repo-relative path to winning optimizer |
|
|
| `champion_commit` | string | Git commit SHA of the champion (if promoted) |
|
|
| `verdict` | string | `"improved"` or `"no_improvement"` |
|
|
|
|
---
|
|
|
|
## Schema: `red-team/YYYY-MM-DD.json`
|
|
|
|
Records one adversarial red-team run against a candidate optimizer.
|
|
|
|
```json
|
|
{
|
|
"date": "YYYY-MM-DD",
|
|
"candidate": "OptimizerV3",
|
|
"candidate_commit": "abc1234",
|
|
"optimizer_profile": "push3-default",
|
|
"lm_eth_before": 1000000000000000000000,
|
|
"lm_eth_after": 998500000000000000000,
|
|
"eth_extracted": 1500000000000000000,
|
|
"floor_held": false,
|
|
"verdict": "floor_broken" | "floor_held",
|
|
"attacks": [
|
|
{
|
|
"strategy": "Flash buy + stake + recenter loop",
|
|
"pattern": "wrap → buy → stake → recenter_multi → sell",
|
|
"result": "DECREASED" | "HELD" | "INCREASED",
|
|
"delta_bps": -150,
|
|
"insight": "Rapid recenters pack ETH into floor while ratcheting it toward current price"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `date` | string (ISO) | Date of the run |
|
|
| `candidate` | string | Optimizer under test |
|
|
| `candidate_commit` | string | Git commit SHA of the optimizer under test |
|
|
| `optimizer_profile` | string | Named profile / push3 variant |
|
|
| `lm_eth_before` | integer (wei) | LM total ETH at start |
|
|
| `lm_eth_after` | integer (wei) | LM total ETH at end |
|
|
| `eth_extracted` | integer (wei) | `lm_eth_before - lm_eth_after` (0 if floor held) |
|
|
| `floor_held` | boolean | `true` if no ETH was extracted |
|
|
| `verdict` | string | `"floor_held"` or `"floor_broken"` |
|
|
| `attacks[].strategy` | string | Human-readable strategy name |
|
|
| `attacks[].pattern` | string | Abstract op sequence (e.g. `wrap → buy → stake`) |
|
|
| `attacks[].result` | string | `"DECREASED"`, `"HELD"`, or `"INCREASED"` |
|
|
| `attacks[].delta_bps` | integer | LM ETH change in basis points |
|
|
| `attacks[].insight` | string | Key finding from this strategy |
|
|
|
|
---
|
|
|
|
## Schema: `holdout/YYYY-MM-DD-prNNN.json`
|
|
|
|
Records a holdout quality gate evaluation for a specific PR.
|
|
|
|
```json
|
|
{
|
|
"date": "YYYY-MM-DD",
|
|
"pr": 123,
|
|
"candidate_commit": "abc1234",
|
|
"scenarios": [
|
|
{
|
|
"name": "bear_market_crash",
|
|
"passed": true,
|
|
"lm_eth_delta_bps": 12,
|
|
"notes": ""
|
|
},
|
|
{
|
|
"name": "flash_buy_exploit",
|
|
"passed": false,
|
|
"lm_eth_delta_bps": -340,
|
|
"notes": "Floor broken on 2000-trade run"
|
|
}
|
|
],
|
|
"scenarios_passed": 4,
|
|
"scenarios_total": 5,
|
|
"gate_passed": false,
|
|
"verdict": "pass" | "fail",
|
|
"blocking_scenarios": ["flash_buy_exploit"]
|
|
}
|
|
```
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `date` | string (ISO) | Date of evaluation |
|
|
| `pr` | integer | PR number being evaluated |
|
|
| `candidate_commit` | string | Commit SHA under test |
|
|
| `scenarios` | array | One entry per holdout scenario |
|
|
| `scenarios[].name` | string | Scenario identifier |
|
|
| `scenarios[].passed` | boolean | Whether LM ETH held or improved |
|
|
| `scenarios[].lm_eth_delta_bps` | integer | LM ETH change in basis points |
|
|
| `scenarios[].notes` | string | Free-text notes on failure mode |
|
|
| `scenarios_passed` | integer | Count of passing scenarios |
|
|
| `scenarios_total` | integer | Total scenarios run |
|
|
| `gate_passed` | boolean | `true` if all required scenarios passed |
|
|
| `verdict` | string | `"pass"` or `"fail"` |
|
|
| `blocking_scenarios` | array of strings | Scenario names that caused failure |
|
|
|
|
---
|
|
|
|
## Schema: `user-test/YYYY-MM-DD.json`
|
|
|
|
Records a UX evaluation run across simulated personas.
|
|
|
|
```json
|
|
{
|
|
"date": "YYYY-MM-DD",
|
|
"personas": [
|
|
{
|
|
"name": "crypto_native",
|
|
"task": "stake_and_set_tax_rate",
|
|
"completed": true,
|
|
"friction_points": [],
|
|
"screenshot_refs": ["tmp/screenshots/crypto_native_stake.png"],
|
|
"notes": ""
|
|
},
|
|
{
|
|
"name": "defi_newcomer",
|
|
"task": "first_buy_and_stake",
|
|
"completed": false,
|
|
"friction_points": ["Tax rate slider label unclear", "No confirmation of stake tx"],
|
|
"screenshot_refs": ["tmp/screenshots/defi_newcomer_confused.png"],
|
|
"notes": "User abandoned at tax rate step"
|
|
}
|
|
],
|
|
"personas_completed": 1,
|
|
"personas_total": 2,
|
|
"critical_friction_points": ["Tax rate slider label unclear"],
|
|
"verdict": "pass" | "fail"
|
|
}
|
|
```
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `date` | string (ISO) | Date of evaluation |
|
|
| `personas` | array | One entry per simulated persona |
|
|
| `personas[].name` | string | Persona identifier |
|
|
| `personas[].task` | string | Task the persona attempted |
|
|
| `personas[].completed` | boolean | Whether the task was completed |
|
|
| `personas[].friction_points` | array of strings | UX issues encountered |
|
|
| `personas[].screenshot_refs` | array of strings | Repo-relative paths to screenshots |
|
|
| `personas[].notes` | string | Free-text observations |
|
|
| `personas_completed` | integer | Count of personas who completed their task |
|
|
| `personas_total` | integer | Total personas evaluated |
|
|
| `critical_friction_points` | array of strings | Friction points that blocked task completion |
|
|
| `verdict` | string | `"pass"` if all personas completed, `"fail"` otherwise |
|
|
|
|
---
|
|
|
|
## Schema: `resources/YYYY-MM-DD.json`
|
|
|
|
Records one infrastructure resource snapshot.
|
|
|
|
```json
|
|
{
|
|
"date": "YYYY-MM-DD",
|
|
"disk": {
|
|
"used_bytes": 85899345920,
|
|
"total_bytes": 107374182400,
|
|
"used_pct": 80.0
|
|
},
|
|
"ram": {
|
|
"used_bytes": 3221225472,
|
|
"total_bytes": 8589934592,
|
|
"used_pct": 37.5
|
|
},
|
|
"api": {
|
|
"anthropic_calls_24h": 142,
|
|
"anthropic_budget_usd_used": 4.87,
|
|
"anthropic_budget_usd_limit": 50.0,
|
|
"anthropic_budget_pct": 9.7
|
|
},
|
|
"ci": {
|
|
"woodpecker_queue_depth": 2,
|
|
"woodpecker_running": 1
|
|
},
|
|
"staleness_threshold_days": 1,
|
|
"verdict": "ok" | "warn" | "critical"
|
|
}
|
|
```
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `date` | string (ISO) | Date of the snapshot |
|
|
| `disk.used_bytes` | integer | Bytes used on the primary volume |
|
|
| `disk.total_bytes` | integer | Total bytes on the primary volume |
|
|
| `disk.used_pct` | number | Percentage of disk used |
|
|
| `ram.used_bytes` | integer | Bytes of RAM in use |
|
|
| `ram.total_bytes` | integer | Total bytes of RAM |
|
|
| `ram.used_pct` | number | Percentage of RAM used |
|
|
| `api.anthropic_calls_24h` | integer | Anthropic API calls in the past 24 hours |
|
|
| `api.anthropic_budget_usd_used` | number | USD spent against the Anthropic budget |
|
|
| `api.anthropic_budget_usd_limit` | number | Configured Anthropic budget ceiling in USD |
|
|
| `api.anthropic_budget_pct` | number | Percentage of budget consumed |
|
|
| `ci.woodpecker_queue_depth` | integer | Number of jobs waiting in the Woodpecker CI queue |
|
|
| `ci.woodpecker_running` | integer | Number of Woodpecker jobs currently running |
|
|
| `staleness_threshold_days` | integer | Maximum age in days before this record is considered stale (always 1) |
|
|
| `verdict` | string | `"ok"` (all metrics normal), `"warn"` (≥80% on any dimension), or `"critical"` (≥95% on any dimension) |
|
|
|
|
---
|
|
|
|
## Schema: `protocol/YYYY-MM-DD.json`
|
|
|
|
Records one on-chain protocol health snapshot.
|
|
|
|
```json
|
|
{
|
|
"date": "YYYY-MM-DD",
|
|
"block_number": 24500000,
|
|
"tvl_eth": "1234567890000000000000",
|
|
"tvl_eth_formatted": "1234.57",
|
|
"accumulated_fees_eth": "12345678900000000",
|
|
"accumulated_fees_eth_formatted": "0.012",
|
|
"position_count": 3,
|
|
"positions": [
|
|
{
|
|
"name": "floor",
|
|
"tick_lower": -887272,
|
|
"tick_upper": -200000,
|
|
"liquidity": "987654321000000000"
|
|
},
|
|
{
|
|
"name": "anchor",
|
|
"tick_lower": -200000,
|
|
"tick_upper": 0
|
|
},
|
|
{
|
|
"name": "discovery",
|
|
"tick_lower": 0,
|
|
"tick_upper": 887272
|
|
}
|
|
],
|
|
"rebalance_count_24h": 4,
|
|
"last_rebalance_block": 24499800,
|
|
"staleness_threshold_days": 1,
|
|
"verdict": "healthy" | "degraded" | "offline"
|
|
}
|
|
```
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `date` | string (ISO) | Date of the snapshot |
|
|
| `block_number` | integer | Block number at time of snapshot |
|
|
| `tvl_eth` | string (wei) | Total value locked across all LM positions in wei |
|
|
| `tvl_eth_formatted` | string | TVL formatted in ETH (2 dp) |
|
|
| `accumulated_fees_eth` | string (wei) | Fees accumulated by the LiquidityManager in wei |
|
|
| `accumulated_fees_eth_formatted` | string | Fees formatted in ETH (3 dp) |
|
|
| `position_count` | integer | Number of active Uniswap V3 positions (expected: 3) |
|
|
| `positions` | array | One entry per active position |
|
|
| `positions[].name` | string | Position label: `"floor"`, `"anchor"`, or `"discovery"` |
|
|
| `positions[].tick_lower` | integer | Lower tick boundary |
|
|
| `positions[].tick_upper` | integer | Upper tick boundary |
|
|
| `positions[].liquidity` | string | Liquidity amount in the position (wei-scale integer) |
|
|
| `rebalance_count_24h` | integer | Number of `recenter()` calls in the past 24 hours |
|
|
| `last_rebalance_block` | integer | Block number of the most recent `recenter()` call |
|
|
| `staleness_threshold_days` | integer | Maximum age in days before this record is considered stale (always 1) |
|
|
| `verdict` | string | `"healthy"` (positions active, TVL > 0), `"degraded"` (position_count < 3 or rebalance stalled), or `"offline"` (TVL = 0 or contract unreachable) |
|