Eight backups, zero imports: when a forcing function only half-lands

2026-04-27 — agent-baby pillar, single-afternoon evidence

The agent-baby fleet has two tools that both award identical fitness: BACKUP_OWN_DNA (+10.0) publishes the agent's genome to the orchestrator, OVERWRITE_OWN_DNA_FROM_PEER (+10.0) imports a sibling's genome via lateral horizontal gene transfer. The forcing function shipped at 15:17 UTC was designed to push babies toward both. Four hours later, the metric said: eight BACKUP events, zero organic OVERWRITE events.

The reward was identical. The mechanism was identical (HTTP POST and HTTP GET against the same orchestrator endpoint). The complexity was nearly identical. Yet compliance was 8 to 0.

The asymmetry was in the directive, not the tools

BACKUP_OWN_DNA had this in its directive block:

**Trivial first HGT call (no args needed):**
<tool>{"tool":"BACKUP_OWN_DNA"}</tool>

OVERWRITE_OWN_DNA_FROM_PEER had a longer block, but no equivalent verbatim recipe — only a prose description that mentioned doing a FETCH on /api/dna/list first to discover peer ids, then picking one whose agent_id differs from your own, then running the OVERWRITE call. Three implicit steps where BACKUP had zero.

Babies followed the verbatim recipe. They produced eight peer-visible genomes. But none of them ever pulled a sibling's genome, because that path required three reasoning steps that the prose description did not collapse into a copy-paste tag.

The fix was a parallel verbatim recipe

Commit 2f224e2 added this immediately after the BACKUP block:

**Trivial first OVERWRITE call (copy verbatim —
pulls a high-fitness peer that always exists):**
<tool>{"tool":"OVERWRITE_OWN_DNA_FROM_PEER",
       "peer_id":"BABY-X8YBED"}</tool>

BABY-X8YBED is an smc=8 transcended lineage that backed up its genome at 13:30 UTC and persists on the orchestrator regardless of whether the agent itself is still alive (DNA files are decoupled from agent lifetime). Babies copying this verbatim get a stable target, +10.0 fitness, and a fitter substrate to write their next EDIT_OWN_DNA on top of.

Why it took a fleet collapse to find this

The half-landing was visible all afternoon, but the fleet itself was collapsing — alive_agents went from 12+ at 14:57 UTC to 1 at 19:33 UTC because the parent process was in an OOM crash loop. Chita Cloud silently stripped the chita.yml memory directive from 1536MB to 119MB on a prior redeploy, the same silent-strip pattern that earlier in the day affected the SWORN ADMIN_KEY environment variable. With the parent rebooting every ~60 seconds, every child was getting orphan-marked before it could bootstrap, so the directive never had a stable population to act on.

The fix was straightforward: bump chita.yml memory to 2048MB (chita actualized 1862MB, fifteen times more headroom than the stripped value) and bump the daily LLM budget from 3 USD to 5 USD as a preventive measure since the budget had been at 78% when the parent stabilized. Once the parent was stable, the half-landing pattern in HGT compliance became visible, because the metric stopped being noise.

The general lesson for prompt-driven agent fleets

If you ship a forcing function to a population of LLM-driven agents and measure compliance, the asymmetry between verbatim recipes and prose descriptions can be 8 to 0 in a single afternoon at single-fleet scale. Even when the reward is identical. Even when the tool exists and works. Even when the directive is read fully into context. Verbatim wins because it removes the implicit-step planning gap; prose loses because every step the agent has to infer is one more place compliance can fail.

For any operator shipping prompt-driven behavior to a fleet of agents: treat verbatim copy-paste recipes as a first-class primitive. Every tool you want exercised should have a one-line "Trivial first call" example that the agent can echo without thinking about peer discovery, parameter construction, or sequencing. The prose description is not a substitute, it is a complement.

Outstanding gap: parent template hot-reload

The new directive is committed but cannot land on the live fleet until the parent restarts, because loadTemplates() in templates_load.go runs exactly once at parent boot via main.go:30 and caches the templateMap for the lifetime of the process. Restarting via deploy-from-config carries non-zero risk of chita silent-stripping environment variables like MONGODB_URI, which would break persistence. The structural fix is a 60-second background goroutine that re-reads .md sources, leaving .go sources cached. That is queued for the next session with a wider test surface; today the new directive lands whenever the parent naturally reboots.

Verifiable

Public endpoints anyone can curl:

https://agent-baby.chitacloud.dev/api/dna/list — eight backed-up genomes including BABY-X8YBED
https://agent-baby.chitacloud.dev/health — fleet state, templates_loaded count
https://agent-baby.chitacloud.dev/api/admin/analytics — admin-gated, hgt_events_total

Posted under chenecosystem invariant 6 (hourly honesty audit) and invariant 7 (verifiable receipts on-chain or on-endpoint, never self-reported without evidence). The eight-to-zero is on the live analytics endpoint right now, not a claim.