harb/formulas/AGENTS.md
johba cbc3c3fd8e chore: gardener housekeeping 2026-03-26
AGENTS.md watermarks refreshed to HEAD (4dedc72). Watermark bump only —
no code changes since last gardener run.

Pending actions (3): re-queue promotion of #1155 pitch-deck to backlog
(pending actions from PR #1162 were not executed by orchestrator).

Escalate: #1158 (Phase 1 completion accuracy) — still needs planner/human decision.
2026-03-26 06:02:19 +00:00

6.2 KiB

Agent Brief: Formulas

Formulas are TOML files that declare automated pipeline jobs for the harb evaluator. Each formula describes what to run, when, and what it produces — the orchestrator reads the TOML and dispatches execution to the scripts referenced in [execution].

Sense vs Act

Every formula has a type field. Getting this wrong breaks orchestrator scheduling and evidence routing.

Type Meaning Side-effects Examples
sense Read-only observation. Produces metrics / evidence only. No PRs, no code changes, no contract deployments. run-holdout, run-protocol, run-resources, run-user-test
act Produces git artifacts: PRs, new files committed to main, contract upgrades. Opens PRs, commits evidence + champion files, promotes attack vectors. run-evolution, run-red-team

Rule of thumb: if the formula's deliver step calls git push or opens a PR, it is act. If it only commits an evidence JSON to main, it is sense.

Current Formulas

ID Type Script Cron Purpose
run-evolution act tools/push3-evolution/evolve.sh Evolve Push3 optimizer candidates, admit champions to seed pool via PR
run-holdout sense scripts/harb-evaluator/evaluate.sh Deploy PR branch, run blind holdout scenarios, report pass/fail
run-protocol sense scripts/harb-evaluator/run-protocol.sh 0 7 * * * On-chain health snapshot (TVL, fees, positions, rebalances)
run-red-team act scripts/harb-evaluator/red-team.sh Adversarial agent attacks the optimizer; promotes novel attack vectors via PR
run-resources sense scripts/harb-evaluator/run-resources.sh 0 6 * * * Infrastructure snapshot (disk, RAM, API budget, CI queue)
run-user-test sense scripts/run-usertest.sh Persona-based Playwright UX evaluation

Cron Conventions

  • Schedules use standard 5-field cron syntax in [cron] schedule.
  • Stagger by at least 1 hour to avoid resource contention (run-resources at 06:00, run-protocol at 07:00).
  • Only sense formulas should be cron-scheduled. An act formula on a timer risks unattended PRs.

Step ID Naming

Steps are declared as [[steps]] arrays. Each step must have an id field.

Conventions:

  • Use lowercase kebab-case: stack-up, run-scenarios, collect-tvl.
  • Prefix collection steps with collect- followed by the metric dimension: collect-disk, collect-ram, collect-fees.
  • Every formula must include a collect step (assembles the evidence JSON) and a deliver step (commits + posts comment).
  • Infrastructure lifecycle steps: stack-up / stack-down (or boot-stack / teardown).
  • Use descriptive verbs: run-attack-suite, evaluate-seeds, export-vectors.

TOML Structure

A formula file follows this skeleton:

# formulas/run-{name}.toml
#
# One-line description of what this formula does.
#
# Type: sense | act
# Cron: (schedule if applicable, or "—")

[formula]
id          = "run-{name}"
name        = "Human-Readable Name"
description = "What it does in one sentence."
type        = "sense"       # or "act"

# [cron]                    # optional — only for scheduled formulas
# schedule = "0 6 * * *"

[inputs.example_input]
type        = "string"      # string | integer | number
required    = true
description = "What this input controls."

[execution]
script     = "path/to/script.sh"
invocation = "ENV_VAR={example_input} bash path/to/script.sh"

[[steps]]
id          = "do-something"
description = """
What this step does, in enough detail for a new contributor to understand.
"""

[[steps]]
id          = "collect"
description = "Assemble metrics into evidence/{category}/{date}.json."
output      = "evidence/{category}/{date}.json"

[[steps]]
id          = "deliver"
description = "Commit evidence file and post summary comment to issue."

[products.evidence_file]
path     = "evidence/{category}/{date}.json"
delivery = "commit to main"
schema   = "evidence/README.md"

[resources]
profile     = "light"       # or "heavy"
concurrency = "safe to run in parallel"  # or "exclusive"

How to Add a New Formula

  1. Pick a name. File goes in formulas/run-{name}.toml. The [formula] id must match: run-{name}.

  2. Decide sense vs act. If your formula only reads state and writes evidence → sense. If it creates PRs, commits code, or modifies contracts → act.

  3. Write the TOML. Follow the skeleton above. Key sections:

    • [formula] — id, name, description, type.
    • [inputs.*] — every tuneable parameter the script accepts.
    • [execution] — script path and full invocation with {input} interpolation.
    • [[steps]] — ordered list of logical steps. Always end with collect and deliver.
    • [products.*] — what the formula produces (evidence file, PR, issue comment).
    • [resources] — profile (light / heavy), concurrency constraints.
  4. Write or wire the backing script. The [execution] script must exist and be executable. Most scripts live in scripts/harb-evaluator/ or tools/. Exit codes: 0 = success, 1 = gate failed, 2 = infra error.

  5. Define the evidence schema. If your formula writes evidence/{category}/{date}.json, add the schema to evidence/README.md.

  6. Update this file. Add your formula to the "Current Formulas" table above.

  7. Test locally. Run the backing script with the required inputs and verify the evidence file is well-formed JSON.

Resource Profiles

Profile Meaning Can run in parallel?
light Shell commands only (df, curl, cast). No Docker, no Anvil. Yes — safe to run alongside anything.
heavy Needs Anvil on port 8545, Docker containers, or long-running agents. No — exclusive. Heavy formulas share port bindings and cannot overlap.

Evaluator Integration

Formula execution is dispatched by the orchestrator to scripts in scripts/harb-evaluator/. See scripts/harb-evaluator/AGENTS.md for details on the evaluator runtime: stack lifecycle, scenario execution, evidence collection, and the adversarial agent harness.