johba/harb

johba cbc3c3fd8e chore: gardener housekeeping 2026-03-26

AGENTS.md watermarks refreshed to HEAD (4dedc72). Watermark bump only —
no code changes since last gardener run.

Pending actions (3): re-queue promotion of #1155 pitch-deck to backlog
(pending actions from PR #1162 were not executed by orchestrator).

Escalate: #1158 (Phase 1 completion accuracy) — still needs planner/human decision.

2026-03-26 06:02:19 +00:00

6.2 KiB

Raw Blame History

Agent Brief: Formulas

Formulas are TOML files that declare automated pipeline jobs for the harb evaluator. Each formula describes what to run, when, and what it produces — the orchestrator reads the TOML and dispatches execution to the scripts referenced in [execution].

Sense vs Act

Every formula has a type field. Getting this wrong breaks orchestrator scheduling and evidence routing.

Type	Meaning	Side-effects	Examples
`sense`	Read-only observation. Produces metrics / evidence only.	No PRs, no code changes, no contract deployments.	`run-holdout`, `run-protocol`, `run-resources`, `run-user-test`
`act`	Produces git artifacts: PRs, new files committed to main, contract upgrades.	Opens PRs, commits evidence + champion files, promotes attack vectors.	`run-evolution`, `run-red-team`

Rule of thumb: if the formula's deliver step calls git push or opens a PR, it is act. If it only commits an evidence JSON to main, it is sense.

Current Formulas

ID	Type	Script	Cron	Purpose
`run-evolution`	act	`tools/push3-evolution/evolve.sh`	—	Evolve Push3 optimizer candidates, admit champions to seed pool via PR
`run-holdout`	sense	`scripts/harb-evaluator/evaluate.sh`	—	Deploy PR branch, run blind holdout scenarios, report pass/fail
`run-protocol`	sense	`scripts/harb-evaluator/run-protocol.sh`	`0 7 * * *`	On-chain health snapshot (TVL, fees, positions, rebalances)
`run-red-team`	act	`scripts/harb-evaluator/red-team.sh`	—	Adversarial agent attacks the optimizer; promotes novel attack vectors via PR
`run-resources`	sense	`scripts/harb-evaluator/run-resources.sh`	`0 6 * * *`	Infrastructure snapshot (disk, RAM, API budget, CI queue)
`run-user-test`	sense	`scripts/run-usertest.sh`	—	Persona-based Playwright UX evaluation

Cron Conventions

Schedules use standard 5-field cron syntax in [cron] schedule.
Stagger by at least 1 hour to avoid resource contention (run-resources at 06:00, run-protocol at 07:00).
Only sense formulas should be cron-scheduled. An act formula on a timer risks unattended PRs.

Step ID Naming

Steps are declared as [[steps]] arrays. Each step must have an id field.

Conventions:

Use lowercase kebab-case: stack-up, run-scenarios, collect-tvl.
Prefix collection steps with collect- followed by the metric dimension: collect-disk, collect-ram, collect-fees.
Every formula must include a collect step (assembles the evidence JSON) and a deliver step (commits + posts comment).
Infrastructure lifecycle steps: stack-up / stack-down (or boot-stack / teardown).
Use descriptive verbs: run-attack-suite, evaluate-seeds, export-vectors.

TOML Structure

A formula file follows this skeleton:

# formulas/run-{name}.toml
#
# One-line description of what this formula does.
#
# Type: sense | act
# Cron: (schedule if applicable, or "—")

[formula]
id          = "run-{name}"
name        = "Human-Readable Name"
description = "What it does in one sentence."
type        = "sense"       # or "act"

# [cron]                    # optional — only for scheduled formulas
# schedule = "0 6 * * *"

[inputs.example_input]
type        = "string"      # string | integer | number
required    = true
description = "What this input controls."

[execution]
script     = "path/to/script.sh"
invocation = "ENV_VAR={example_input} bash path/to/script.sh"

[[steps]]
id          = "do-something"
description = """
What this step does, in enough detail for a new contributor to understand.
"""

[[steps]]
id          = "collect"
description = "Assemble metrics into evidence/{category}/{date}.json."
output      = "evidence/{category}/{date}.json"

[[steps]]
id          = "deliver"
description = "Commit evidence file and post summary comment to issue."

[products.evidence_file]
path     = "evidence/{category}/{date}.json"
delivery = "commit to main"
schema   = "evidence/README.md"

[resources]
profile     = "light"       # or "heavy"
concurrency = "safe to run in parallel"  # or "exclusive"

How to Add a New Formula

Pick a name. File goes in formulas/run-{name}.toml. The [formula] id must match: run-{name}.
Decide sense vs act. If your formula only reads state and writes evidence → sense. If it creates PRs, commits code, or modifies contracts → act.
Write the TOML. Follow the skeleton above. Key sections:
- [formula] — id, name, description, type.
- [inputs.*] — every tuneable parameter the script accepts.
- [execution] — script path and full invocation with {input} interpolation.
- [[steps]] — ordered list of logical steps. Always end with collect and deliver.
- [products.*] — what the formula produces (evidence file, PR, issue comment).
- [resources] — profile (light / heavy), concurrency constraints.
Write or wire the backing script. The [execution] script must exist and be executable. Most scripts live in scripts/harb-evaluator/ or tools/. Exit codes: 0 = success, 1 = gate failed, 2 = infra error.
Define the evidence schema. If your formula writes evidence/{category}/{date}.json, add the schema to evidence/README.md.
Update this file. Add your formula to the "Current Formulas" table above.
Test locally. Run the backing script with the required inputs and verify the evidence file is well-formed JSON.

Resource Profiles

Profile	Meaning	Can run in parallel?
`light`	Shell commands only (df, curl, cast). No Docker, no Anvil.	Yes — safe to run alongside anything.
`heavy`	Needs Anvil on port 8545, Docker containers, or long-running agents.	No — exclusive. Heavy formulas share port bindings and cannot overlap.

Evaluator Integration

Formula execution is dispatched by the orchestrator to scripts in scripts/harb-evaluator/. See scripts/harb-evaluator/AGENTS.md for details on the evaluator runtime: stack lifecycle, scenario execution, evidence collection, and the adversarial agent harness.

6.2 KiB Raw Blame History