Commit graph

4 commits

Author SHA1 Message Date
openhands
152f6e0a40 fix: Formula: run-red-team (adversarial attack + discovery) (#976)
Address review feedback:
- Remove candidate input (Push3 transpilation not wired; documented in
  notes.candidate_injection as planned follow-up)
- Mark run-attack-suite step as status="planned" with run_attack_suite_gap note
- Update execution.invocation to only pass env vars red-team.sh actually reads
- Fix export-vectors args to include --eth-extracted and --eth-before flags
- Clarify export-vectors only runs when floor_broken (BROKE=true)
- Document tmp/red-team-snapshots.jsonl (AttackRunner replay side output)
- Add comment that {attack_type} in products.attack_vectors.path is
  runtime-computed by promote-attacks.sh, not a formula input
- Fix schema comment notation (§ → ##)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19 12:04:56 +01:00
openhands
3564c4ad25 fix: Formula: run-red-team (adversarial attack + discovery) (#976)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19 12:04:56 +01:00
openhands
d278954b44 fix: Formula: run-holdout (PR quality gate) (#977)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19 10:58:15 +01:00
openhands
27f841927e fix: Formula: run-user-test (persona UX evaluation) (#978)
Add formulas/run-user-test.toml — a sense-only process definition for
persona-based UX evaluation. Defines 5 personas across 2 funnels
(passive-holder: tyler/alex/sarah; staker: priya/marcus), full stack
lifecycle (start → run → collect → stop → deliver), and the three
standard evidence delivery products (evidence JSON committed to main,
screenshots referenced in evidence, summary as issue comment).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19 09:10:14 +00:00