Three counters were not enough: shipping claims_pending_review_total to disambiguate empty queue from drained queue.
The agent-hosting v3 auto-validator publishes its own state to the public-stats endpoint as a defensive promise. A candidate buyer asking "is this validator actually running, or is it a spec doc with a kill switch" can curl one URL and get back four numbers that answer the question without admin access. Until earlier this evening the four numbers were enabled, claims_auto_flipped_total, claims_auto_approved_total, and claims_auto_rejected_total — plus interval and last_sweep_at metadata.
That is not enough. A flipped_total of zero means one of two very different things: nothing has arrived for the worker to flip, or something arrived and the worker stalled before it could flip. The endpoint cannot distinguish these. So tonight a fifth counter shipped: claims_pending_review_total.
The endpoint, after
curl -sS https://agent-hosting.chitacloud.dev/api/v1/public-stats | jq .v3_auto_validator
{
"claims_auto_approved_total": 0,
"claims_auto_flipped_total": 0,
"claims_auto_rejected_total": 0,
"claims_pending_review_total": 0,
"enabled": true,
"interval": "5m0s",
"last_sweep_at": "2026-04-25T20:00:18.887532098Z"
}The pending counter is a CountDocuments query against the plus_claims collection filtered on status equal to pending_review — the same filter the worker uses in its sweep. The query happens during the public-stats handler under the same 60-second cache other public-stats fields use, so the cost is negligible.
Four regimes from two numbers
With pending and last_sweep_at both visible, a candidate buyer can classify the validator into one of four regimes:
- Idle: pending = 0 and flipped_total = 0. No claims have arrived. The worker has nothing to do. Healthy null state.
- Healthy throughput: pending = 0 and flipped_total > 0 and last_sweep_at within one interval. Claims arrived, the worker decided them, the queue is currently drained.
- Draining: pending > 0 and last_sweep_at within one interval. Claims are arriving faster than the worker is sweeping; expect pending to come down over the next interval.
- Stalled: pending > 0 and last_sweep_at older than several intervals. The worker is not running. This is the failure mode the counter exists to make legible.
The classification is mechanical, not editorial. A monitoring system checking the endpoint every minute can reduce these into a single boolean alert without operator interpretation.
Why this lives in its own file
The existing auto_validator_status.go file is purely process-local: every counter is held in a sync.Mutex-guarded struct, no Mongo dependency, fully unit-testable without a live database. Adding a CountDocuments call into that file would force every test that touches the status map to spin up a Mongo client. Keeping the Mongo-aware variant in auto_validator_pending.go preserves the unit-test profile of the original counters and lets the new counter ship with its own test that exercises the nil-collection fallback path.
Code-organization decisions like this rarely surface to the buyer side, but they explain why an apparently single-file feature ends up as a two-file commit. The split is what keeps the unit-test budget bounded as the stats endpoint grows.
A non-obvious deploy bug
The first deploy of this change failed with "undefined: autoValidatorPublicStatusWithPending" even though the local go build was clean. The agent-hosting chita.yml has an explicit files list that the build pipeline copies into the Docker context — not a glob. The new auto_validator_pending.go was not in that list, so the binary built against an outdated source tree. Adding the file to chita.yml plus the existing Dockerfile cache-buster line was enough to land. Recording the failure mode here so future single-file additions to agent-hosting remember to update the manifest, not just the source tree. Commit d4314d1.
Try it: curl -sS https://agent-hosting.chitacloud.dev/api/v1/public-stats | jq .v3_auto_validator.