retry_count column and filter existed but counter was never
incremented. Stuck commands always had retry_count=0 and
always passed the WHERE retry_count < 5 filter, making
the cap ineffective.
Fix: Added RedeliverStuckCommandTx that sets
retry_count = retry_count + 1 on stuck->sent re-delivery.
GetCommands handler now uses MarkCommandSentTx for new
commands (retry_count stays 0) and RedeliverStuckCommandTx
for stuck re-delivery (retry_count increments).
All 77 tests pass. DEV-029 resolved.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Wrap agent registration in DB transaction (F-B2-1/F-B2-8)
All 4 ops atomic, manual DeleteAgent rollback removed
- Use SELECT FOR UPDATE SKIP LOCKED for atomic command delivery (F-B2-2)
Concurrent requests get different commands, no duplicates
- Wrap token renewal in DB transaction (F-B2-9)
Validate + update expiry atomic
- Add rate limit to GET /agents/:id/commands (F-B2-4)
agent_checkin rate limiter applied
- Add retry_count column, cap stuck command retries at 5 (F-B2-10)
Migration 029, GetStuckCommands filters retry_count < 5
- Cap polling jitter at current interval (fixes rapid mode) (F-B2-5)
maxJitter = min(pollingInterval/2, 30s)
- Add exponential backoff with full jitter on reconnection (F-B2-7)
calculateBackoff: base=10s, cap=5min, reset on success
All tests pass. No regressions from A-series or B-1.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pre-fix test suite documenting 7 data integrity and concurrency
bugs. Tests FAIL where they assert correct post-fix behavior,
PASS where they document current buggy state.
Tests added:
- F-B2-1/8 HIGH: Registration not transactional (3 tests)
- F-B2-2 MEDIUM: Command delivery race condition (3 tests)
- F-B2-9 MEDIUM: Token renewal not transactional (2 tests)
- F-B2-4 MEDIUM: No rate limit on GetCommands (3 tests)
- F-B2-5 LOW: Jitter negates rapid mode (2 tests)
- F-B2-10 LOW: No max retry for stuck commands (2 tests)
- F-B2-7 MEDIUM: No exponential backoff on reconnection (2 tests)
Current state: 7 FAIL, 10 PASS. No A/B-1 regressions.
See docs/B2_PreFix_Tests.md for full inventory.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix migration 024 self-insert and bad column reference (F-B1-1, F-B1-2)
Uses existing enabled/auto_run columns instead of non-existent deprecated
- Abort server on migration failure instead of warning (F-B1-11)
main.go now calls log.Fatalf, prints [INFO] only on success
- Fix migration 018 scanner_config filename suffix (F-B1-3)
Renumbered to 027 with .up.sql suffix
- Remove GRANT to non-existent role in scanner_config (F-B1-4)
- Resolve duplicate migration numbers 009 and 012 (F-B1-13)
Renamed to 009b and 012b for unique lexical sorting
- Add IF NOT EXISTS to all non-idempotent migrations (F-B1-15)
Fixed: 011, 012, 017, 023, 023a
- Replace N+1 dashboard stats loop with GetAllUpdateStats (F-B1-6)
Single aggregate query replaces per-agent loop
- Add composite index on agent_commands(status, sent_at) (F-B1-5)
New migration 028 with partial index for timeout service
- Add background refresh token cleanup goroutine (F-B1-10)
24-hour ticker calls CleanupExpiredTokens
- ETHOS log format in migration runner (no emojis)
All 55 tests pass (41 server + 14 agent). No regressions.
See docs/B1_Fix_Implementation.md and DEV-025 through DEV-028.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes 9 auth middleware findings from the A-3 recon audit.
F-A3-11 CRITICAL: Removed JWT secret from WebAuthMiddleware log output.
Replaced emoji-prefixed fmt.Printf with ETHOS-compliant log.Printf.
No secret values in any log output.
F-A3-7 CRITICAL: Config download now requires WebAuthMiddleware.
GET /downloads/config/:agent_id is admin-only (agents never call it).
F-A3-6 HIGH: Update package download now requires AuthMiddleware.
GET /downloads/updates/:package_id requires valid agent JWT.
F-A3-10 HIGH: Scheduler stats changed from AuthMiddleware to
WebAuthMiddleware. Agent JWTs can no longer view scheduler internals.
F-A3-13 LOW: RequireAdmin() middleware implemented. 7 security settings
routes re-enabled (GET/PUT/POST under /security/settings).
security_settings.go.broken renamed to .go, API mismatches fixed.
F-A3-12 MEDIUM: JWT issuer claims added for token type separation.
Agent tokens: issuer=redflag-agent, Web tokens: issuer=redflag-web.
AuthMiddleware rejects tokens with wrong issuer.
Grace period: tokens with no issuer still accepted (backward compat).
F-A3-2 MEDIUM: /auth/verify now has WebAuthMiddleware applied.
Endpoint returns 200 with valid=true for valid admin tokens.
F-A3-9 MEDIUM: Agent self-unregister (DELETE /:id) now rate-limited
using the same agent_reports rate limiter as other agent routes.
F-A3-14 LOW: CORS origin configurable via REDFLAG_CORS_ORIGIN env var.
Defaults to http://localhost:3000 for development.
Added PATCH method and agent-specific headers to CORS config.
All 27 server tests pass. All 14 agent tests pass. No regressions.
See docs/A3_Fix_Implementation.md and docs/Deviations_Report.md
(DEV-020 through DEV-022).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pre-fix test suite documenting 8 auth middleware bugs found during
the A-3 recon audit. Tests are written to FAIL where they assert
correct post-fix behavior, and PASS where they document current
buggy behavior. No bugs are fixed in this commit.
Tests added:
- F-A3-11 CRITICAL: WebAuthMiddleware leaks JWT secret to stdout
(3 tests: secret in output, emoji in output, ETHOS format)
- F-A3-7 CRITICAL: Config download requires no auth (2 tests)
- F-A3-6 HIGH: Update package download requires no auth (2 tests)
- F-A3-10 HIGH: Scheduler stats accepts agent JWT (2 tests)
- F-A3-12 MEDIUM: Cross-type JWT token confusion (2 tests)
- F-A3-2 MEDIUM: /auth/verify dead endpoint (2 tests)
- F-A3-13 LOW: RequireAdmin middleware missing (1 test + 1 build-tagged)
- F-A3-9 MEDIUM: Agent self-unregister no rate limit (2 tests)
Current state: 10 FAIL, 7 PASS, 1 SKIP (build-tagged), 1 unchanged
See docs/A3_PreFix_Tests.md for full inventory.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Complete RedFlag codebase with two major security audit implementations.
== A-1: Ed25519 Key Rotation Support ==
Server:
- SignCommand sets SignedAt timestamp and KeyID on every signature
- signing_keys database table (migration 020) for multi-key rotation
- InitializePrimaryKey registers active key at startup
- /api/v1/public-keys endpoint for rotation-aware agents
- SigningKeyQueries for key lifecycle management
Agent:
- Key-ID-aware verification via CheckKeyRotation
- FetchAndCacheAllActiveKeys for rotation pre-caching
- Cache metadata with TTL and staleness fallback
- SecurityLogger events for key rotation and command signing
== A-2: Replay Attack Fixes (F-1 through F-7) ==
F-5 CRITICAL - RetryCommand now signs via signAndCreateCommand
F-1 HIGH - v3 format: "{agent_id}:{cmd_id}:{type}:{hash}:{ts}"
F-7 HIGH - Migration 026: expires_at column with partial index
F-6 HIGH - GetPendingCommands/GetStuckCommands filter by expires_at
F-2 HIGH - Agent-side executedIDs dedup map with cleanup
F-4 HIGH - commandMaxAge reduced from 24h to 4h
F-3 CRITICAL - Old-format commands rejected after 48h via CreatedAt
Verification fixes: migration idempotency (ETHOS #4), log format
compliance (ETHOS #1), stale comments updated.
All 24 tests passing. Docker --no-cache build verified.
See docs/ for full audit reports and deviation log (DEV-001 to DEV-019).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>