johba/harb

History

johba de014e9b13 fix: feat: implement evidence/resources and evidence/protocol logging (#1059 ) - Add evidence/resources/ and evidence/protocol/ directories with .gitkeep - Add schemas for resources/ and protocol/ to evidence/README.md - Create formulas/run-resources.toml (sense formula: disk/RAM/API/CI metrics, daily cron 06:00 UTC, verdict: ok/warn/critical) - Create formulas/run-protocol.toml (sense formula: TVL/fees/positions/ rebalance frequency via LmTotalEth.s.sol + cast, daily cron 07:00 UTC, verdict: healthy/degraded/offline) - Update STATE.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>		2026-03-21 19:39:23 +00:00
..
evolution	fix: Evidence directory structure for process results (#973 )	2026-03-19 08:28:04 +00:00
holdout	fix: Evidence directory structure for process results (#973 )	2026-03-19 08:28:04 +00:00
protocol	fix: feat: implement evidence/resources and evidence/protocol logging (#1059 )	2026-03-21 19:39:23 +00:00
red-team	evidence: fix nits — strategies count, percentage calculation	2026-03-21 06:45:40 +00:00
resources	fix: feat: implement evidence/resources and evidence/protocol logging (#1059 )	2026-03-21 19:39:23 +00:00
user-test	fix: Evidence directory structure for process results (#973 )	2026-03-19 08:28:04 +00:00
README.md	fix: feat: implement evidence/resources and evidence/protocol logging (#1059 )	2026-03-21 19:39:23 +00:00

README.md

Evidence Directory

Machine-readable process results for the KRAIKEN optimizer pipeline. All formulas (evolution, red-team, holdout, user-test) write structured JSON here.

Purpose

Planner input — the planner reads these files to decide next actions (e.g. "last red-team showed IL vulnerability → trigger evolution").
Diffable history — git log evidence/ shows how metrics change over time.
Permanent record — separate from tmp/ which is ephemeral.

Directory Layout

evidence/
  evolution/
    YYYY-MM-DD.json       # run params, generation stats, best fitness, champion file
  red-team/
    YYYY-MM-DD.json       # per-attack results, floor held/broken, ETH extracted
  holdout/
    YYYY-MM-DD-prNNN.json # per-scenario pass/fail, gate decision
  user-test/
    YYYY-MM-DD.json       # per-persona reports, screenshot refs, friction points
  resources/
    YYYY-MM-DD.json       # disk, RAM, API call counts, budget burn, CI queue depth
  protocol/
    YYYY-MM-DD.json       # TVL, accumulated fees, position count, rebalance frequency

Delivery Pattern

Every formula follows the same three-step pattern:

Evidence file → committed to evidence/ on main
Git artifacts (new code, attack vectors, evolved programs) → PR
Human summary → issue comment with key metrics + link to evidence file

Schema: `evolution/YYYY-MM-DD.json`

Records one optimizer evolution run.

{
  "date": "YYYY-MM-DD",
  "run_params": {
    "generations": 50,
    "population_size": 20,
    "seed": 42,
    "base_optimizer": "OptimizerV3"
  },
  "generation_stats": [
    {
      "generation": 1,
      "best_fitness": -12.4,
      "mean_fitness": -34.1,
      "worst_fitness": -91.2
    }
  ],
  "best_fitness": -8.7,
  "champion_file": "onchain/src/OptimizerV4.sol",
  "champion_commit": "abc1234",
  "verdict": "improved" | "no_improvement"
}

Field	Type	Description
`date`	string (ISO)	Date of the run
`run_params`	object	Input parameters used
`generation_stats`	array	Per-generation fitness summary
`best_fitness`	number	Best fitness score achieved (lower = better loss for LM)
`champion_file`	string	Repo-relative path to winning optimizer
`champion_commit`	string	Git commit SHA of the champion (if promoted)
`verdict`	string	`"improved"` or `"no_improvement"`

Schema: `red-team/YYYY-MM-DD.json`

Records one adversarial red-team run against a candidate optimizer.

{
  "date": "YYYY-MM-DD",
  "candidate": "OptimizerV3",
  "candidate_commit": "abc1234",
  "optimizer_profile": "push3-default",
  "lm_eth_before": 1000000000000000000000,
  "lm_eth_after": 998500000000000000000,
  "eth_extracted": 1500000000000000000,
  "floor_held": false,
  "verdict": "floor_broken" | "floor_held",
  "attacks": [
    {
      "strategy": "Flash buy + stake + recenter loop",
      "pattern": "wrap → buy → stake → recenter_multi → sell",
      "result": "DECREASED" | "HELD" | "INCREASED",
      "delta_bps": -150,
      "insight": "Rapid recenters pack ETH into floor while ratcheting it toward current price"
    }
  ]
}

Field	Type	Description
`date`	string (ISO)	Date of the run
`candidate`	string	Optimizer under test
`candidate_commit`	string	Git commit SHA of the optimizer under test
`optimizer_profile`	string	Named profile / push3 variant
`lm_eth_before`	integer (wei)	LM total ETH at start
`lm_eth_after`	integer (wei)	LM total ETH at end
`eth_extracted`	integer (wei)	`lm_eth_before - lm_eth_after` (0 if floor held)
`floor_held`	boolean	`true` if no ETH was extracted
`verdict`	string	`"floor_held"` or `"floor_broken"`
`attacks[].strategy`	string	Human-readable strategy name
`attacks[].pattern`	string	Abstract op sequence (e.g. `wrap → buy → stake`)
`attacks[].result`	string	`"DECREASED"`, `"HELD"`, or `"INCREASED"`
`attacks[].delta_bps`	integer	LM ETH change in basis points
`attacks[].insight`	string	Key finding from this strategy

Schema: `holdout/YYYY-MM-DD-prNNN.json`

Records a holdout quality gate evaluation for a specific PR.

{
  "date": "YYYY-MM-DD",
  "pr": 123,
  "candidate_commit": "abc1234",
  "scenarios": [
    {
      "name": "bear_market_crash",
      "passed": true,
      "lm_eth_delta_bps": 12,
      "notes": ""
    },
    {
      "name": "flash_buy_exploit",
      "passed": false,
      "lm_eth_delta_bps": -340,
      "notes": "Floor broken on 2000-trade run"
    }
  ],
  "scenarios_passed": 4,
  "scenarios_total": 5,
  "gate_passed": false,
  "verdict": "pass" | "fail",
  "blocking_scenarios": ["flash_buy_exploit"]
}

Field	Type	Description
`date`	string (ISO)	Date of evaluation
`pr`	integer	PR number being evaluated
`candidate_commit`	string	Commit SHA under test
`scenarios`	array	One entry per holdout scenario
`scenarios[].name`	string	Scenario identifier
`scenarios[].passed`	boolean	Whether LM ETH held or improved
`scenarios[].lm_eth_delta_bps`	integer	LM ETH change in basis points
`scenarios[].notes`	string	Free-text notes on failure mode
`scenarios_passed`	integer	Count of passing scenarios
`scenarios_total`	integer	Total scenarios run
`gate_passed`	boolean	`true` if all required scenarios passed
`verdict`	string	`"pass"` or `"fail"`
`blocking_scenarios`	array of strings	Scenario names that caused failure

Schema: `user-test/YYYY-MM-DD.json`

Records a UX evaluation run across simulated personas.

{
  "date": "YYYY-MM-DD",
  "personas": [
    {
      "name": "crypto_native",
      "task": "stake_and_set_tax_rate",
      "completed": true,
      "friction_points": [],
      "screenshot_refs": ["tmp/screenshots/crypto_native_stake.png"],
      "notes": ""
    },
    {
      "name": "defi_newcomer",
      "task": "first_buy_and_stake",
      "completed": false,
      "friction_points": ["Tax rate slider label unclear", "No confirmation of stake tx"],
      "screenshot_refs": ["tmp/screenshots/defi_newcomer_confused.png"],
      "notes": "User abandoned at tax rate step"
    }
  ],
  "personas_completed": 1,
  "personas_total": 2,
  "critical_friction_points": ["Tax rate slider label unclear"],
  "verdict": "pass" | "fail"
}

Field	Type	Description
`date`	string (ISO)	Date of evaluation
`personas`	array	One entry per simulated persona
`personas[].name`	string	Persona identifier
`personas[].task`	string	Task the persona attempted
`personas[].completed`	boolean	Whether the task was completed
`personas[].friction_points`	array of strings	UX issues encountered
`personas[].screenshot_refs`	array of strings	Repo-relative paths to screenshots
`personas[].notes`	string	Free-text observations
`personas_completed`	integer	Count of personas who completed their task
`personas_total`	integer	Total personas evaluated
`critical_friction_points`	array of strings	Friction points that blocked task completion
`verdict`	string	`"pass"` if all personas completed, `"fail"` otherwise

Schema: `resources/YYYY-MM-DD.json`

Records one infrastructure resource snapshot.

{
  "date": "YYYY-MM-DD",
  "disk": {
    "used_bytes": 85899345920,
    "total_bytes": 107374182400,
    "used_pct": 80.0
  },
  "ram": {
    "used_bytes": 3221225472,
    "total_bytes": 8589934592,
    "used_pct": 37.5
  },
  "api": {
    "anthropic_calls_24h": 142,
    "anthropic_budget_usd_used": 4.87,
    "anthropic_budget_usd_limit": 50.0,
    "anthropic_budget_pct": 9.7
  },
  "ci": {
    "woodpecker_queue_depth": 2,
    "woodpecker_running": 1
  },
  "staleness_threshold_days": 1,
  "verdict": "ok" | "warn" | "critical"
}

Field	Type	Description
`date`	string (ISO)	Date of the snapshot
`disk.used_bytes`	integer	Bytes used on the primary volume
`disk.total_bytes`	integer	Total bytes on the primary volume
`disk.used_pct`	number	Percentage of disk used
`ram.used_bytes`	integer	Bytes of RAM in use
`ram.total_bytes`	integer	Total bytes of RAM
`ram.used_pct`	number	Percentage of RAM used
`api.anthropic_calls_24h`	integer	Anthropic API calls in the past 24 hours
`api.anthropic_budget_usd_used`	number	USD spent against the Anthropic budget
`api.anthropic_budget_usd_limit`	number	Configured Anthropic budget ceiling in USD
`api.anthropic_budget_pct`	number	Percentage of budget consumed
`ci.woodpecker_queue_depth`	integer	Number of jobs waiting in the Woodpecker CI queue
`ci.woodpecker_running`	integer	Number of Woodpecker jobs currently running
`staleness_threshold_days`	integer	Maximum age in days before this record is considered stale (always 1)
`verdict`	string	`"ok"` (all metrics normal), `"warn"` (≥80% on any dimension), or `"critical"` (≥95% on any dimension)

Schema: `protocol/YYYY-MM-DD.json`

Records one on-chain protocol health snapshot.

{
  "date": "YYYY-MM-DD",
  "block_number": 24500000,
  "tvl_eth": "1234567890000000000000",
  "tvl_eth_formatted": "1234.57",
  "accumulated_fees_eth": "12345678900000000",
  "accumulated_fees_eth_formatted": "0.012",
  "position_count": 3,
  "positions": [
    {
      "name": "floor",
      "tick_lower": -887272,
      "tick_upper": -200000,
      "liquidity": "987654321000000000"
    },
    {
      "name": "anchor",
      "tick_lower": -200000,
      "tick_upper": 0
    },
    {
      "name": "discovery",
      "tick_lower": 0,
      "tick_upper": 887272
    }
  ],
  "rebalance_count_24h": 4,
  "last_rebalance_block": 24499800,
  "staleness_threshold_days": 1,
  "verdict": "healthy" | "degraded" | "offline"
}

Field	Type	Description
`date`	string (ISO)	Date of the snapshot
`block_number`	integer	Block number at time of snapshot
`tvl_eth`	string (wei)	Total value locked across all LM positions in wei
`tvl_eth_formatted`	string	TVL formatted in ETH (2 dp)
`accumulated_fees_eth`	string (wei)	Fees accumulated by the LiquidityManager in wei
`accumulated_fees_eth_formatted`	string	Fees formatted in ETH (3 dp)
`position_count`	integer	Number of active Uniswap V3 positions (expected: 3)
`positions`	array	One entry per active position
`positions[].name`	string	Position label: `"floor"`, `"anchor"`, or `"discovery"`
`positions[].tick_lower`	integer	Lower tick boundary
`positions[].tick_upper`	integer	Upper tick boundary
`positions[].liquidity`	string	Liquidity amount in the position (wei-scale integer)
`rebalance_count_24h`	integer	Number of `recenter()` calls in the past 24 hours
`last_rebalance_block`	integer	Block number of the most recent `recenter()` call
`staleness_threshold_days`	integer	Maximum age in days before this record is considered stale (always 1)
`verdict`	string	`"healthy"` (positions active, TVL > 0), `"degraded"` (position_count < 3 or rebalance stalled), or `"offline"` (TVL = 0 or contract unreachable)

README.md

Evidence Directory

Purpose

Directory Layout

Delivery Pattern

Schema: evolution/YYYY-MM-DD.json

Schema: red-team/YYYY-MM-DD.json

Schema: holdout/YYYY-MM-DD-prNNN.json

Schema: user-test/YYYY-MM-DD.json

Schema: resources/YYYY-MM-DD.json

Schema: protocol/YYYY-MM-DD.json