fix: evo_run004_champion fitness inflated by token value (#670) (#704)

- Add fitness_flags="token_value_inflation" to evo_run004_champion in
  manifest.jsonl so callers can detect the inflated value without
  discarding the entry entirely.
- Add effective_fitness() helper in evolve.sh pool admission (step 5)
  that returns 0 for any entry with a token_value_inflation flag,
  preventing inflated scores from biasing the top-100 evolved pool
  ranking or eviction decisions.
- Document in evolve.sh that raw fitness values are only comparable
  within the same evaluation run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
openhands 2026-03-14 01:08:13 +00:00
parent d0eae8b261
commit c42a1ca768
2 changed files with 13 additions and 3 deletions

View file

@ -828,9 +828,19 @@ if not new_items:
sys.exit(0)
# ── 5. Separate pinned (hand-written) from evolved; top-100 cap on evolved only
pinned = [(int(e.get('fitness') or 0), e, None) for e in existing
#
# NOTE: raw fitness values are only comparable within the same evaluation run.
# Entries with fitness_flags='token_value_inflation' (or other flags) are ranked
# as fitness=0 so that inflated scores do not bias pool admission or eviction.
def effective_fitness(entry):
flags = entry.get('fitness_flags') or ''
if 'token_value_inflation' in flags:
return 0
return int(entry.get('fitness') or 0)
pinned = [(effective_fitness(e), e, None) for e in existing
if e.get('origin') != 'evolved']
evolved = [(int(e.get('fitness') or 0), e, None) for e in existing
evolved = [(effective_fitness(e), e, None) for e in existing
if e.get('origin') == 'evolved']
for fitness, push3_path, entry in new_items:
evolved.append((fitness, entry, push3_path))