- Wrap agent registration in DB transaction (F-B2-1/F-B2-8) All 4 ops atomic, manual DeleteAgent rollback removed - Use SELECT FOR UPDATE SKIP LOCKED for atomic command delivery (F-B2-2) Concurrent requests get different commands, no duplicates - Wrap token renewal in DB transaction (F-B2-9) Validate + update expiry atomic - Add rate limit to GET /agents/:id/commands (F-B2-4) agent_checkin rate limiter applied - Add retry_count column, cap stuck command retries at 5 (F-B2-10) Migration 029, GetStuckCommands filters retry_count < 5 - Cap polling jitter at current interval (fixes rapid mode) (F-B2-5) maxJitter = min(pollingInterval/2, 30s) - Add exponential backoff with full jitter on reconnection (F-B2-7) calculateBackoff: base=10s, cap=5min, reset on success All tests pass. No regressions from A-series or B-1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2.2 KiB
B-2 Data Integrity & Concurrency Fix Implementation
Date: 2026-03-29 Branch: culurien
Files Changed
Server
| File | Change |
|---|---|
handlers/agents.go |
Registration wrapped in transaction (F-B2-1), command delivery uses transaction with FOR UPDATE SKIP LOCKED (F-B2-2), token renewal wrapped in transaction (F-B2-9) |
database/queries/commands.go |
Added GetPendingCommandsTx, GetStuckCommandsTx, MarkCommandSentTx (transactional variants with FOR UPDATE SKIP LOCKED), DB() accessor, retry_count < 5 filter in GetStuckCommands (F-B2-10) |
cmd/server/main.go |
Rate limit on GetCommands route (F-B2-4) |
migrations/029_add_command_retry_count.up.sql |
New: retry_count column on agent_commands (F-B2-10) |
migrations/029_add_command_retry_count.down.sql |
New: rollback |
Agent
| File | Change |
|---|---|
cmd/agent/main.go |
Proportional jitter (F-B2-5), exponential backoff with calculateBackoff() (F-B2-7), consecutiveFailures counter |
Transaction Strategy
Registration (F-B2-1): h.agentQueries.DB.Beginx() starts a transaction. CreateAgent, MarkTokenUsed, and CreateRefreshToken all execute on tx. JWT is generated AFTER tx.Commit(). defer tx.Rollback() ensures cleanup on any error.
Command Delivery (F-B2-2): h.commandQueries.DB().Beginx() starts a transaction. GetPendingCommandsTx and GetStuckCommandsTx use SELECT ... FOR UPDATE SKIP LOCKED. MarkCommandSentTx updates within the same transaction. Concurrent requests skip locked rows (get different commands).
Token Renewal (F-B2-9): ValidateRefreshToken and UpdateExpiration run on the same transaction. JWT generated after commit.
Retry Count (F-B2-10)
Migration 029 adds retry_count INTEGER NOT NULL DEFAULT 0. GetStuckCommands filters AND retry_count < 5. Max 5 re-deliveries per command.
Jitter Cap (F-B2-5)
maxJitter = min(pollingInterval/2, 30s). Rapid mode (5s) gets 0-2s jitter. Standard (300s) gets 0-30s.
Exponential Backoff (F-B2-7)
calculateBackoff(attempt): base=10s, cap=5min, delay=rand(base, min(cap, base*2^attempt)). Reset to 0 on success.
Final Migration Sequence
001 → ... → 028 → 029. No duplicates.