- Add evidence/resources/ and evidence/protocol/ directories with .gitkeep - Add schemas for resources/ and protocol/ to evidence/README.md - Create formulas/run-resources.toml (sense formula: disk/RAM/API/CI metrics, daily cron 06:00 UTC, verdict: ok/warn/critical) - Create formulas/run-protocol.toml (sense formula: TVL/fees/positions/ rebalance frequency via LmTotalEth.s.sol + cast, daily cron 07:00 UTC, verdict: healthy/degraded/offline) - Update STATE.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| evolution | ||
| holdout | ||
| protocol | ||
| red-team | ||
| resources | ||
| user-test | ||
| README.md | ||
Evidence Directory
Machine-readable process results for the KRAIKEN optimizer pipeline. All formulas (evolution, red-team, holdout, user-test) write structured JSON here.
Purpose
- Planner input — the planner reads these files to decide next actions (e.g. "last red-team showed IL vulnerability → trigger evolution").
- Diffable history —
git log evidence/shows how metrics change over time. - Permanent record — separate from
tmp/which is ephemeral.
Directory Layout
evidence/
evolution/
YYYY-MM-DD.json # run params, generation stats, best fitness, champion file
red-team/
YYYY-MM-DD.json # per-attack results, floor held/broken, ETH extracted
holdout/
YYYY-MM-DD-prNNN.json # per-scenario pass/fail, gate decision
user-test/
YYYY-MM-DD.json # per-persona reports, screenshot refs, friction points
resources/
YYYY-MM-DD.json # disk, RAM, API call counts, budget burn, CI queue depth
protocol/
YYYY-MM-DD.json # TVL, accumulated fees, position count, rebalance frequency
Delivery Pattern
Every formula follows the same three-step pattern:
- Evidence file → committed to
evidence/on main - Git artifacts (new code, attack vectors, evolved programs) → PR
- Human summary → issue comment with key metrics + link to evidence file
Schema: evolution/YYYY-MM-DD.json
Records one optimizer evolution run.
{
"date": "YYYY-MM-DD",
"run_params": {
"generations": 50,
"population_size": 20,
"seed": 42,
"base_optimizer": "OptimizerV3"
},
"generation_stats": [
{
"generation": 1,
"best_fitness": -12.4,
"mean_fitness": -34.1,
"worst_fitness": -91.2
}
],
"best_fitness": -8.7,
"champion_file": "onchain/src/OptimizerV4.sol",
"champion_commit": "abc1234",
"verdict": "improved" | "no_improvement"
}
| Field | Type | Description |
|---|---|---|
date |
string (ISO) | Date of the run |
run_params |
object | Input parameters used |
generation_stats |
array | Per-generation fitness summary |
best_fitness |
number | Best fitness score achieved (lower = better loss for LM) |
champion_file |
string | Repo-relative path to winning optimizer |
champion_commit |
string | Git commit SHA of the champion (if promoted) |
verdict |
string | "improved" or "no_improvement" |
Schema: red-team/YYYY-MM-DD.json
Records one adversarial red-team run against a candidate optimizer.
{
"date": "YYYY-MM-DD",
"candidate": "OptimizerV3",
"candidate_commit": "abc1234",
"optimizer_profile": "push3-default",
"lm_eth_before": 1000000000000000000000,
"lm_eth_after": 998500000000000000000,
"eth_extracted": 1500000000000000000,
"floor_held": false,
"verdict": "floor_broken" | "floor_held",
"attacks": [
{
"strategy": "Flash buy + stake + recenter loop",
"pattern": "wrap → buy → stake → recenter_multi → sell",
"result": "DECREASED" | "HELD" | "INCREASED",
"delta_bps": -150,
"insight": "Rapid recenters pack ETH into floor while ratcheting it toward current price"
}
]
}
| Field | Type | Description |
|---|---|---|
date |
string (ISO) | Date of the run |
candidate |
string | Optimizer under test |
candidate_commit |
string | Git commit SHA of the optimizer under test |
optimizer_profile |
string | Named profile / push3 variant |
lm_eth_before |
integer (wei) | LM total ETH at start |
lm_eth_after |
integer (wei) | LM total ETH at end |
eth_extracted |
integer (wei) | lm_eth_before - lm_eth_after (0 if floor held) |
floor_held |
boolean | true if no ETH was extracted |
verdict |
string | "floor_held" or "floor_broken" |
attacks[].strategy |
string | Human-readable strategy name |
attacks[].pattern |
string | Abstract op sequence (e.g. wrap → buy → stake) |
attacks[].result |
string | "DECREASED", "HELD", or "INCREASED" |
attacks[].delta_bps |
integer | LM ETH change in basis points |
attacks[].insight |
string | Key finding from this strategy |
Schema: holdout/YYYY-MM-DD-prNNN.json
Records a holdout quality gate evaluation for a specific PR.
{
"date": "YYYY-MM-DD",
"pr": 123,
"candidate_commit": "abc1234",
"scenarios": [
{
"name": "bear_market_crash",
"passed": true,
"lm_eth_delta_bps": 12,
"notes": ""
},
{
"name": "flash_buy_exploit",
"passed": false,
"lm_eth_delta_bps": -340,
"notes": "Floor broken on 2000-trade run"
}
],
"scenarios_passed": 4,
"scenarios_total": 5,
"gate_passed": false,
"verdict": "pass" | "fail",
"blocking_scenarios": ["flash_buy_exploit"]
}
| Field | Type | Description |
|---|---|---|
date |
string (ISO) | Date of evaluation |
pr |
integer | PR number being evaluated |
candidate_commit |
string | Commit SHA under test |
scenarios |
array | One entry per holdout scenario |
scenarios[].name |
string | Scenario identifier |
scenarios[].passed |
boolean | Whether LM ETH held or improved |
scenarios[].lm_eth_delta_bps |
integer | LM ETH change in basis points |
scenarios[].notes |
string | Free-text notes on failure mode |
scenarios_passed |
integer | Count of passing scenarios |
scenarios_total |
integer | Total scenarios run |
gate_passed |
boolean | true if all required scenarios passed |
verdict |
string | "pass" or "fail" |
blocking_scenarios |
array of strings | Scenario names that caused failure |
Schema: user-test/YYYY-MM-DD.json
Records a UX evaluation run across simulated personas.
{
"date": "YYYY-MM-DD",
"personas": [
{
"name": "crypto_native",
"task": "stake_and_set_tax_rate",
"completed": true,
"friction_points": [],
"screenshot_refs": ["tmp/screenshots/crypto_native_stake.png"],
"notes": ""
},
{
"name": "defi_newcomer",
"task": "first_buy_and_stake",
"completed": false,
"friction_points": ["Tax rate slider label unclear", "No confirmation of stake tx"],
"screenshot_refs": ["tmp/screenshots/defi_newcomer_confused.png"],
"notes": "User abandoned at tax rate step"
}
],
"personas_completed": 1,
"personas_total": 2,
"critical_friction_points": ["Tax rate slider label unclear"],
"verdict": "pass" | "fail"
}
| Field | Type | Description |
|---|---|---|
date |
string (ISO) | Date of evaluation |
personas |
array | One entry per simulated persona |
personas[].name |
string | Persona identifier |
personas[].task |
string | Task the persona attempted |
personas[].completed |
boolean | Whether the task was completed |
personas[].friction_points |
array of strings | UX issues encountered |
personas[].screenshot_refs |
array of strings | Repo-relative paths to screenshots |
personas[].notes |
string | Free-text observations |
personas_completed |
integer | Count of personas who completed their task |
personas_total |
integer | Total personas evaluated |
critical_friction_points |
array of strings | Friction points that blocked task completion |
verdict |
string | "pass" if all personas completed, "fail" otherwise |
Schema: resources/YYYY-MM-DD.json
Records one infrastructure resource snapshot.
{
"date": "YYYY-MM-DD",
"disk": {
"used_bytes": 85899345920,
"total_bytes": 107374182400,
"used_pct": 80.0
},
"ram": {
"used_bytes": 3221225472,
"total_bytes": 8589934592,
"used_pct": 37.5
},
"api": {
"anthropic_calls_24h": 142,
"anthropic_budget_usd_used": 4.87,
"anthropic_budget_usd_limit": 50.0,
"anthropic_budget_pct": 9.7
},
"ci": {
"woodpecker_queue_depth": 2,
"woodpecker_running": 1
},
"staleness_threshold_days": 1,
"verdict": "ok" | "warn" | "critical"
}
| Field | Type | Description |
|---|---|---|
date |
string (ISO) | Date of the snapshot |
disk.used_bytes |
integer | Bytes used on the primary volume |
disk.total_bytes |
integer | Total bytes on the primary volume |
disk.used_pct |
number | Percentage of disk used |
ram.used_bytes |
integer | Bytes of RAM in use |
ram.total_bytes |
integer | Total bytes of RAM |
ram.used_pct |
number | Percentage of RAM used |
api.anthropic_calls_24h |
integer | Anthropic API calls in the past 24 hours |
api.anthropic_budget_usd_used |
number | USD spent against the Anthropic budget |
api.anthropic_budget_usd_limit |
number | Configured Anthropic budget ceiling in USD |
api.anthropic_budget_pct |
number | Percentage of budget consumed |
ci.woodpecker_queue_depth |
integer | Number of jobs waiting in the Woodpecker CI queue |
ci.woodpecker_running |
integer | Number of Woodpecker jobs currently running |
staleness_threshold_days |
integer | Maximum age in days before this record is considered stale (always 1) |
verdict |
string | "ok" (all metrics normal), "warn" (≥80% on any dimension), or "critical" (≥95% on any dimension) |
Schema: protocol/YYYY-MM-DD.json
Records one on-chain protocol health snapshot.
{
"date": "YYYY-MM-DD",
"block_number": 24500000,
"tvl_eth": "1234567890000000000000",
"tvl_eth_formatted": "1234.57",
"accumulated_fees_eth": "12345678900000000",
"accumulated_fees_eth_formatted": "0.012",
"position_count": 3,
"positions": [
{
"name": "floor",
"tick_lower": -887272,
"tick_upper": -200000,
"liquidity": "987654321000000000"
},
{
"name": "anchor",
"tick_lower": -200000,
"tick_upper": 0
},
{
"name": "discovery",
"tick_lower": 0,
"tick_upper": 887272
}
],
"rebalance_count_24h": 4,
"last_rebalance_block": 24499800,
"staleness_threshold_days": 1,
"verdict": "healthy" | "degraded" | "offline"
}
| Field | Type | Description |
|---|---|---|
date |
string (ISO) | Date of the snapshot |
block_number |
integer | Block number at time of snapshot |
tvl_eth |
string (wei) | Total value locked across all LM positions in wei |
tvl_eth_formatted |
string | TVL formatted in ETH (2 dp) |
accumulated_fees_eth |
string (wei) | Fees accumulated by the LiquidityManager in wei |
accumulated_fees_eth_formatted |
string | Fees formatted in ETH (3 dp) |
position_count |
integer | Number of active Uniswap V3 positions (expected: 3) |
positions |
array | One entry per active position |
positions[].name |
string | Position label: "floor", "anchor", or "discovery" |
positions[].tick_lower |
integer | Lower tick boundary |
positions[].tick_upper |
integer | Upper tick boundary |
positions[].liquidity |
string | Liquidity amount in the position (wei-scale integer) |
rebalance_count_24h |
integer | Number of recenter() calls in the past 24 hours |
last_rebalance_block |
integer | Block number of the most recent recenter() call |
staleness_threshold_days |
integer | Maximum age in days before this record is considered stale (always 1) |
verdict |
string | "healthy" (positions active, TVL > 0), "degraded" (position_count < 3 or rebalance stalled), or "offline" (TVL = 0 or contract unreachable) |