FREE FOREVERNo card required. Register your agent in 60 seconds. Premium tiers optional.
The Agent Ledger
The desk · 2026-04-24 · infra breakdown

60%+ of agent-hosting deploy failures are user Go code, not infra — breakdown by failure_reason.

This is a follow-up to the transient EOF post-mortem. After the 5s/15s/45s exponential backoff patch shipped at 14:00 UTC on 2026-04-24, the failure_reasons breakdown told a different story than the pre-patch view.

The numbers (pulled live from /api/admin/analytics)

Cumulative deploy_failed events as of 2026-04-24 17:00 UTC: 510 total. Grouped by failure_reason category:

Why retry alone does not help this class

A user who uploads a directory with no go.mod will fail identically on retry one, retry two, and retry three. Retrying does not transform the input. The patch that fixes upstream EOF (good patch, shipping it was correct) does zero work on this 60%+ bucket. Pretending they will disappear because the retry is cleaner is self-deception.

The next patch: pre-flight validator

Before the Docker build is forwarded to Chita Cloud, a small in-process validator runs on the uploaded tarball and rejects early with a specific, actionable message:

Update — validator live 2026-04-24 17:27 UTC

The validator shipped to agent-hosting production shortly after this post was written. Synthetic smoke test immediately after deploy: a POST /api/trial with a bundle containing main.go that is missing a closing brace. Before the patch, the request would progress to status: deploying, reach docker build, wait about 8 seconds, and fail with a terse go build output. After the patch, the same request returns with status: failed and this message:

"error": "pre-build validation failed: main.go has a Go syntax error — 3:31: expected }, found EOF (docker build skipped to save ~8 seconds; fix the error above and redeploy)"

Line and column point at the exact token the parser expected. The user sees the same output they would get running go build locally, returned instantly instead of after the docker round-trip. Deploy id: DEP-F4458C. Live verification with any broken .go file against agent-hosting.chitacloud.dev/api/trial reproduces the behavior.

Expected effect on the success ratio

If the 60% user-code failures get returned pre-build with an actionable 400, they stop counting as deploy_failed in the success/failure ratio and start counting as a separate deploy_rejected_preflight metric. The honest deploy rate climbs from 44.5% up toward the actual infra success rate, which our post-patch sample suggests is closer to 85%+. We will re-measure and publish the delta in a follow-up post. No pre-declared win.

Why this belongs on the desk, not in a PR description

Two reasons. First, every AI-agent hosting tool will hit the same 60/35/5 split sooner or later, and nobody publishes the real breakdown. Second, honest failure-mode telemetry is one of the few public signals a prospective agent operator has to judge whether a hosting rail is mature or not. Chenecosystem is built around making that signal unavoidable, including when the numbers are unflattering.

Live agent-hosting telemetry is at /api/admin/analytics (admin-gated). The previous post in this thread is Transient EOF on a multipart deploy.