fix(concurrency): B-2 data integrity and race condition fixes
- Wrap agent registration in DB transaction (F-B2-1/F-B2-8) All 4 ops atomic, manual DeleteAgent rollback removed - Use SELECT FOR UPDATE SKIP LOCKED for atomic command delivery (F-B2-2) Concurrent requests get different commands, no duplicates - Wrap token renewal in DB transaction (F-B2-9) Validate + update expiry atomic - Add rate limit to GET /agents/:id/commands (F-B2-4) agent_checkin rate limiter applied - Add retry_count column, cap stuck command retries at 5 (F-B2-10) Migration 029, GetStuckCommands filters retry_count < 5 - Cap polling jitter at current interval (fixes rapid mode) (F-B2-5) maxJitter = min(pollingInterval/2, 30s) - Add exponential backoff with full jitter on reconnection (F-B2-7) calculateBackoff: base=10s, cap=5min, reset on success All tests pass. No regressions from A-series or B-1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
50
docs/B2_Fix_Implementation.md
Normal file
50
docs/B2_Fix_Implementation.md
Normal file
@@ -0,0 +1,50 @@
|
||||
# B-2 Data Integrity & Concurrency Fix Implementation
|
||||
|
||||
**Date:** 2026-03-29
|
||||
**Branch:** culurien
|
||||
|
||||
---
|
||||
|
||||
## Files Changed
|
||||
|
||||
### Server
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `handlers/agents.go` | Registration wrapped in transaction (F-B2-1), command delivery uses transaction with FOR UPDATE SKIP LOCKED (F-B2-2), token renewal wrapped in transaction (F-B2-9) |
|
||||
| `database/queries/commands.go` | Added GetPendingCommandsTx, GetStuckCommandsTx, MarkCommandSentTx (transactional variants with FOR UPDATE SKIP LOCKED), DB() accessor, retry_count < 5 filter in GetStuckCommands (F-B2-10) |
|
||||
| `cmd/server/main.go` | Rate limit on GetCommands route (F-B2-4) |
|
||||
| `migrations/029_add_command_retry_count.up.sql` | New: retry_count column on agent_commands (F-B2-10) |
|
||||
| `migrations/029_add_command_retry_count.down.sql` | New: rollback |
|
||||
|
||||
### Agent
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `cmd/agent/main.go` | Proportional jitter (F-B2-5), exponential backoff with calculateBackoff() (F-B2-7), consecutiveFailures counter |
|
||||
|
||||
---
|
||||
|
||||
## Transaction Strategy
|
||||
|
||||
**Registration (F-B2-1):** `h.agentQueries.DB.Beginx()` starts a transaction. CreateAgent, MarkTokenUsed, and CreateRefreshToken all execute on `tx`. JWT is generated AFTER `tx.Commit()`. `defer tx.Rollback()` ensures cleanup on any error.
|
||||
|
||||
**Command Delivery (F-B2-2):** `h.commandQueries.DB().Beginx()` starts a transaction. GetPendingCommandsTx and GetStuckCommandsTx use `SELECT ... FOR UPDATE SKIP LOCKED`. MarkCommandSentTx updates within the same transaction. Concurrent requests skip locked rows (get different commands).
|
||||
|
||||
**Token Renewal (F-B2-9):** ValidateRefreshToken and UpdateExpiration run on the same transaction. JWT generated after commit.
|
||||
|
||||
## Retry Count (F-B2-10)
|
||||
|
||||
Migration 029 adds `retry_count INTEGER NOT NULL DEFAULT 0`. GetStuckCommands filters `AND retry_count < 5`. Max 5 re-deliveries per command.
|
||||
|
||||
## Jitter Cap (F-B2-5)
|
||||
|
||||
`maxJitter = min(pollingInterval/2, 30s)`. Rapid mode (5s) gets 0-2s jitter. Standard (300s) gets 0-30s.
|
||||
|
||||
## Exponential Backoff (F-B2-7)
|
||||
|
||||
`calculateBackoff(attempt)`: base=10s, cap=5min, delay=rand(base, min(cap, base*2^attempt)). Reset to 0 on success.
|
||||
|
||||
## Final Migration Sequence
|
||||
|
||||
001 → ... → 028 → 029. No duplicates.
|
||||
Reference in New Issue
Block a user