Commit Graph

17 Commits

Author SHA1 Message Date
38184a9625 test(windows): C-1 pre-fix tests for Windows-specific bugs
Pre-fix test suite for 7 Windows-specific findings. All tests
are SHARED (no build tags) — they compile and run on Linux
using source file inspection and direct function calls.

Tests added:
- F-C1-1 HIGH: Winget PATH-only search (2 tests)
- F-C1-2 MEDIUM: Winget text parser spaces bug (4 tests)
- F-C1-3 HIGH: Ghost updates — no post-install verification (3 tests)
- F-C1-4 RESOLVED: Service auto-restart already configured (1 test)
- F-C1-5 HIGH: Duplicated polling loop missing B-2 fixes (5 tests)
- F-C1-6 LOW: Winget uses fmt.Printf (2 tests)
- F-C1-7 LOW: Service has emojis in logs (2 tests)

Current state: 8 FAIL, 11 PASS. All prior tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 08:51:44 -04:00
799c155d94 docs: C-1 Windows-specific bugs audit
Comprehensive audit of Windows agent code: winget detection,
Windows Update ghost updates, service wrapper, HWID, and
vendored windowsupdate package.

Key findings:
- F-C1-1 HIGH: Winget not found as SYSTEM (PATH-only search)
- F-C1-3 HIGH: No post-install verification (ghost updates)
- F-C1-5 HIGH: Windows service has duplicated polling loop
  missing B-2 fixes (jitter cap, exponential backoff)
- F-C1-2 MEDIUM: Fragile winget text parser
- F-C1-4 MEDIUM: No service auto-restart on crash

9 findings total. See docs/C1_Windows_Audit.md for details.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 08:22:18 -04:00
f71f878a35 fix(concurrency): wire retry_count increment for stuck command re-delivery (DEV-029)
retry_count column and filter existed but counter was never
incremented. Stuck commands always had retry_count=0 and
always passed the WHERE retry_count < 5 filter, making
the cap ineffective.

Fix: Added RedeliverStuckCommandTx that sets
retry_count = retry_count + 1 on stuck->sent re-delivery.
GetCommands handler now uses MarkCommandSentTx for new
commands (retry_count stays 0) and RedeliverStuckCommandTx
for stuck re-delivery (retry_count increments).

All 77 tests pass. DEV-029 resolved.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 08:16:12 -04:00
e93d850ab9 verify: B-2 data integrity verification — fixes verified with 1 follow-up
All B-2 concurrency fixes verified:
- Registration transaction: atomic, no orphaned agents
- SELECT FOR UPDATE SKIP LOCKED: prevents duplicate delivery
- Token renewal: atomic validate + update
- GetCommands: rate limited with agent_checkin key
- Jitter: capped at min(pollingInterval/2, 30s)
- Exponential backoff: base=10s, cap=5min, reset on success

Finding: DEV-029 — retry_count column exists but never incremented.
Filter is in place but ineffective. Targeted fix needed.

77 tests pass. No regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 08:09:35 -04:00
3ca42d50f4 fix(concurrency): B-2 data integrity and race condition fixes
- Wrap agent registration in DB transaction (F-B2-1/F-B2-8)
  All 4 ops atomic, manual DeleteAgent rollback removed
- Use SELECT FOR UPDATE SKIP LOCKED for atomic command delivery (F-B2-2)
  Concurrent requests get different commands, no duplicates
- Wrap token renewal in DB transaction (F-B2-9)
  Validate + update expiry atomic
- Add rate limit to GET /agents/:id/commands (F-B2-4)
  agent_checkin rate limiter applied
- Add retry_count column, cap stuck command retries at 5 (F-B2-10)
  Migration 029, GetStuckCommands filters retry_count < 5
- Cap polling jitter at current interval (fixes rapid mode) (F-B2-5)
  maxJitter = min(pollingInterval/2, 30s)
- Add exponential backoff with full jitter on reconnection (F-B2-7)
  calculateBackoff: base=10s, cap=5min, reset on success

All tests pass. No regressions from A-series or B-1.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 08:00:36 -04:00
59ab7cbd5f test(concurrency): B-2 pre-fix tests for data integrity and concurrency bugs
Pre-fix test suite documenting 7 data integrity and concurrency
bugs. Tests FAIL where they assert correct post-fix behavior,
PASS where they document current buggy state.

Tests added:
- F-B2-1/8 HIGH: Registration not transactional (3 tests)
- F-B2-2 MEDIUM: Command delivery race condition (3 tests)
- F-B2-9 MEDIUM: Token renewal not transactional (2 tests)
- F-B2-4 MEDIUM: No rate limit on GetCommands (3 tests)
- F-B2-5 LOW: Jitter negates rapid mode (2 tests)
- F-B2-10 LOW: No max retry for stuck commands (2 tests)
- F-B2-7 MEDIUM: No exponential backoff on reconnection (2 tests)

Current state: 7 FAIL, 10 PASS. No A/B-1 regressions.
See docs/B2_PreFix_Tests.md for full inventory.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 07:45:16 -04:00
2fd0fd27fa docs: B-2 data integrity and concurrency audit
Comprehensive audit of registration token races, command queue
concurrency, rapid mode risks, agent staleness, transaction
safety, and deadlock risks.

Key findings:
- F-B2-1 HIGH: Registration flow not transactional (4 separate ops)
- F-B2-8 HIGH: Same as F-B2-1 (crash leaves orphaned agent)
- F-B2-2 MEDIUM: Duplicate command delivery on concurrent requests
- F-B2-4 MEDIUM: No cap on concurrent rapid-mode agents
- F-B2-7 MEDIUM: No staggered reconnection after server restart
- F-B2-9 MEDIUM: Token renewal not transactional (self-healing)

10 findings total. See docs/B2_Data_Integrity_Audit.md for details.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 07:26:02 -04:00
1f828b6f61 verify: B-1 schema integrity verification — all fixes verified
- Migration sequence: 30 files, no duplicates, monotonically ordered
- Migration 024: self-insert removed, bad column fixed, idempotent
- Server aborts on migration failure (log.Fatalf)
- Scanner config migration renumbered to 027 with correct suffix
- All migrations idempotent (TestAllMigrationsAreIdempotent passes)
- N+1 replaced with GetAllUpdateStats aggregate query
- Stuck commands index (028) and background cleanup verified
- 55 tests pass (41 server + 14 agent), zero regressions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 07:12:50 -04:00
ec0d880036 fix(database): B-1 schema integrity and migration fixes
- Fix migration 024 self-insert and bad column reference (F-B1-1, F-B1-2)
  Uses existing enabled/auto_run columns instead of non-existent deprecated
- Abort server on migration failure instead of warning (F-B1-11)
  main.go now calls log.Fatalf, prints [INFO] only on success
- Fix migration 018 scanner_config filename suffix (F-B1-3)
  Renumbered to 027 with .up.sql suffix
- Remove GRANT to non-existent role in scanner_config (F-B1-4)
- Resolve duplicate migration numbers 009 and 012 (F-B1-13)
  Renamed to 009b and 012b for unique lexical sorting
- Add IF NOT EXISTS to all non-idempotent migrations (F-B1-15)
  Fixed: 011, 012, 017, 023, 023a
- Replace N+1 dashboard stats loop with GetAllUpdateStats (F-B1-6)
  Single aggregate query replaces per-agent loop
- Add composite index on agent_commands(status, sent_at) (F-B1-5)
  New migration 028 with partial index for timeout service
- Add background refresh token cleanup goroutine (F-B1-10)
  24-hour ticker calls CleanupExpiredTokens
- ETHOS log format in migration runner (no emojis)

All 55 tests pass (41 server + 14 agent). No regressions.
See docs/B1_Fix_Implementation.md and DEV-025 through DEV-028.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 07:03:35 -04:00
ab676c3b83 test(database): B-1 pre-fix tests for migration and schema bugs
Pre-fix test suite documenting 9 database migration and schema
integrity bugs. Tests FAIL where they assert correct post-fix
behavior, PASS where they document current buggy state.

Tests added:
- F-B1-11 P0: main.go swallows migration errors (3 tests)
- F-B1-13: Duplicate migration numbers 009/012 (2 tests)
- F-B1-1: Migration 024 self-insert into schema_migrations (2 tests)
- F-B1-2: Migration 024 references non-existent column (2 tests)
- F-B1-3: Migration 018 wrong file suffix (2 tests)
- F-B1-4: Migration 018 GRANT to wrong role (1 test)
- F-B1-15: 7+ migrations not idempotent (2 tests)
- F-B1-5: Missing agent_commands sent_at index (2 tests)
- F-B1-6: N+1 query in GetDashboardStats (2 tests)
- F-B1-10: No background refresh token cleanup (2 tests)

Current state: 10 PASS, 10 FAIL, 0 SKIP.
All A-series tests continue to pass (no regressions).
See docs/B1_PreFix_Tests.md for full inventory.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 06:42:19 -04:00
3de7577802 docs: B-1 database migration and schema integrity audit
Comprehensive audit of the custom migration runner, all 26 migrations,
query patterns, foreign keys, and schema state management.

Critical findings:
- F-B1-11 P0: Server starts with incomplete schema after migration
  failure, prints [OK] — main.go swallows migration errors
- F-B1-1: Migration 024 self-inserts into schema_migrations
- F-B1-2: Migration 024 references non-existent deprecated column
- F-B1-3: Migration 018 scanner_config has wrong file extension
- F-B1-6: N+1 query in GetDashboardStats (1 query per agent)

15 findings total across P0/CRITICAL/HIGH/MEDIUM/LOW.
See docs/B1_Database_Audit.md for full analysis.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 06:28:59 -04:00
c2774342f3 verify: A-series refactor verification — all tests pass
All pre-existing tests pass after dead code cleanup.
No regressions from A-1, A-2, or A-3 fix rounds.
41 tests pass, 1 skip (pre-existing). Zero new failures.
Ready to proceed to B-series database audit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 06:21:05 -04:00
3e1e2a78fd refactor: A-series dead code cleanup and ETHOS compliance sweep
- Remove dead queries.RetryCommand function (DEV-019, 31 lines)
- Remove security_settings.go.broken leftover from A-3
- Remove 5 compiled test binaries from aggregator-agent/ (~61MB)
- Remove config_builder.go.restored from repo root
- Remove test_disk_detection.go and test_disk.go (throwaway test files)
- Fix 6 banned word violations (production-ready, enhanced, robust, seamlessly)
- Add .gitignore rules for compiled agent binaries
- Document machine ID duplication for D-1 fix prompt
- Document 30+ pre-existing emoji violations for D-2 pass

No behavior changes. All 41 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 06:17:12 -04:00
6e62208f82 docs: A-3 verification report — all fixes verified
All 9 auth middleware fixes confirmed correct:
- F-A3-11: JWT secret leak removed, ETHOS log format
- F-A3-7: Config download protected (WebAuthMiddleware)
- F-A3-6: Update download protected (AuthMiddleware)
- F-A3-10: Scheduler stats on WebAuthMiddleware
- F-A3-13: RequireAdmin implemented, 7 routes re-enabled
- F-A3-12: JWT issuer claims with backward compat grace period
- F-A3-2: /auth/verify endpoint fixed
- F-A3-9: Agent unregister rate-limited
- F-A3-14: CORS origin configurable

41 tests pass (27 server + 14 agent). No regressions.
Zero issues found during verification.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 06:07:57 -04:00
4c62de8d8b fix(security): A-3 auth middleware coverage fixes
Fixes 9 auth middleware findings from the A-3 recon audit.

F-A3-11 CRITICAL: Removed JWT secret from WebAuthMiddleware log output.
  Replaced emoji-prefixed fmt.Printf with ETHOS-compliant log.Printf.
  No secret values in any log output.

F-A3-7 CRITICAL: Config download now requires WebAuthMiddleware.
  GET /downloads/config/:agent_id is admin-only (agents never call it).

F-A3-6 HIGH: Update package download now requires AuthMiddleware.
  GET /downloads/updates/:package_id requires valid agent JWT.

F-A3-10 HIGH: Scheduler stats changed from AuthMiddleware to
  WebAuthMiddleware. Agent JWTs can no longer view scheduler internals.

F-A3-13 LOW: RequireAdmin() middleware implemented. 7 security settings
  routes re-enabled (GET/PUT/POST under /security/settings).
  security_settings.go.broken renamed to .go, API mismatches fixed.

F-A3-12 MEDIUM: JWT issuer claims added for token type separation.
  Agent tokens: issuer=redflag-agent, Web tokens: issuer=redflag-web.
  AuthMiddleware rejects tokens with wrong issuer.
  Grace period: tokens with no issuer still accepted (backward compat).

F-A3-2 MEDIUM: /auth/verify now has WebAuthMiddleware applied.
  Endpoint returns 200 with valid=true for valid admin tokens.

F-A3-9 MEDIUM: Agent self-unregister (DELETE /:id) now rate-limited
  using the same agent_reports rate limiter as other agent routes.

F-A3-14 LOW: CORS origin configurable via REDFLAG_CORS_ORIGIN env var.
  Defaults to http://localhost:3000 for development.
  Added PATCH method and agent-specific headers to CORS config.

All 27 server tests pass. All 14 agent tests pass. No regressions.
See docs/A3_Fix_Implementation.md and docs/Deviations_Report.md
(DEV-020 through DEV-022).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 22:17:40 -04:00
ee246771dc test(security): A-3 pre-fix tests for auth middleware coverage bugs
Pre-fix test suite documenting 8 auth middleware bugs found during
the A-3 recon audit. Tests are written to FAIL where they assert
correct post-fix behavior, and PASS where they document current
buggy behavior. No bugs are fixed in this commit.

Tests added:
- F-A3-11 CRITICAL: WebAuthMiddleware leaks JWT secret to stdout
  (3 tests: secret in output, emoji in output, ETHOS format)
- F-A3-7 CRITICAL: Config download requires no auth (2 tests)
- F-A3-6 HIGH: Update package download requires no auth (2 tests)
- F-A3-10 HIGH: Scheduler stats accepts agent JWT (2 tests)
- F-A3-12 MEDIUM: Cross-type JWT token confusion (2 tests)
- F-A3-2 MEDIUM: /auth/verify dead endpoint (2 tests)
- F-A3-13 LOW: RequireAdmin middleware missing (1 test + 1 build-tagged)
- F-A3-9 MEDIUM: Agent self-unregister no rate limit (2 tests)

Current state: 10 FAIL, 7 PASS, 1 SKIP (build-tagged), 1 unchanged
See docs/A3_PreFix_Tests.md for full inventory.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 21:54:48 -04:00
f97d4845af feat(security): A-1 Ed25519 key rotation + A-2 replay attack fixes
Complete RedFlag codebase with two major security audit implementations.

== A-1: Ed25519 Key Rotation Support ==

Server:
- SignCommand sets SignedAt timestamp and KeyID on every signature
- signing_keys database table (migration 020) for multi-key rotation
- InitializePrimaryKey registers active key at startup
- /api/v1/public-keys endpoint for rotation-aware agents
- SigningKeyQueries for key lifecycle management

Agent:
- Key-ID-aware verification via CheckKeyRotation
- FetchAndCacheAllActiveKeys for rotation pre-caching
- Cache metadata with TTL and staleness fallback
- SecurityLogger events for key rotation and command signing

== A-2: Replay Attack Fixes (F-1 through F-7) ==

F-5 CRITICAL - RetryCommand now signs via signAndCreateCommand
F-1 HIGH     - v3 format: "{agent_id}:{cmd_id}:{type}:{hash}:{ts}"
F-7 HIGH     - Migration 026: expires_at column with partial index
F-6 HIGH     - GetPendingCommands/GetStuckCommands filter by expires_at
F-2 HIGH     - Agent-side executedIDs dedup map with cleanup
F-4 HIGH     - commandMaxAge reduced from 24h to 4h
F-3 CRITICAL - Old-format commands rejected after 48h via CreatedAt

Verification fixes: migration idempotency (ETHOS #4), log format
compliance (ETHOS #1), stale comments updated.

All 24 tests passing. Docker --no-cache build verified.
See docs/ for full audit reports and deviation log (DEV-001 to DEV-019).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 21:25:47 -04:00