Redflag

Author	SHA1	Message	Date
jpetree331	b52f705b46	fix(ethos): D-2 ETHOS compliance sweep - Remove emoji from log statements across server and agent - Replace fmt.Printf with log.Printf in queries and handlers - Apply [TAG] [system] [component] format throughout - Exempt: display/terminal.go, setup.go, main.go CLI sections Total violations fixed: ~45 (emoji + fmt.Printf) All tests pass. Zero regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 10:43:16 -04:00
jpetree331	0da761243b	test(ethos): D-2 pre-fix tests for ETHOS compliance violations Pre-fix tests documenting emoji in log statements and fmt.Printf used as logging across server and agent codebases. Tests added: - Server emoji: machine_binding.go, agents.go, update handlers (6 tests) - Server fmt.Printf: queries, handlers, services (6 tests) - Agent emoji: main.go log paths, migration executor (4 tests) - Exemptions: display/terminal.go, setup.go (2 tests, always pass) Current state: 8 FAIL, 8 PASS, 2 ALWAYS-PASS. All prior tests pass. No regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 10:14:01 -04:00
jpetree331	47aa1da604	docs: D-2 ETHOS compliance audit — pre-existing violations Full scan of emoji, fmt.Printf, log format, and banned word violations that predate the A/B/C/D-1 fix series. Findings: - ~61 emoji violations in log statements (server+agent) - ~23 emoji instances in UI/CLI (intentional, lower priority) - ~12 fmt.Printf used as logging (should be log.Printf) - 0 banned words (all cleaned in prior series) - 0 silenced errors Estimated effort: MEDIUM. Priority: fmt.Printf fixes first, then emoji in log statements, CLI emojis last. See docs/D2_ETHOS_Compliance_Audit.md for complete listing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 10:04:04 -04:00
jpetree331	d43e5a2332	verify: D-1 machine ID fixes verified All 5 D-1 fixes verified. Registration 'unknown-' fallback removed, clean abort on failure. Rebind endpoint operational with admin auth and input validation. Dead code deleted. Windows retry removed. 106 tests pass. No regressions from A/B/C series. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 09:59:29 -04:00
jpetree331	db67049e92	fix(identity): D-1 machine ID deduplication fixes - Remove unhashed 'unknown-' fallback from registration (F-D1-1) Registration aborts if GetMachineID() fails (no bad data) - Add POST /admin/agents/:id/rebind-machine-id endpoint (F-D1-2) Admin can update stored machine ID after hardware change - Delete dead example_integration.go with wrong usage (F-D1-3) - Remove redundant Windows machineid.ID() retry (F-D1-4) - Replace fmt.Printf with log.Printf in client.go (F-D1-5) Operator note: agents registered with 'unknown-' machine IDs must be rebound before upgrading. See D1_Fix_Implementation.md. All tests pass. No regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 09:53:43 -04:00
jpetree331	2c98973f87	test(machineid): D-1 pre-fix tests for machine ID duplication bugs Pre-fix tests for 5 machine ID findings. Tests FAIL where they assert correct post-fix behavior, PASS where they document bugs. Tests added: - F-D1-1 HIGH: Registration fallback unhashed format (4 tests) - F-D1-1: Hash function and format consistency (3 tests) - F-D1-2 MEDIUM: No machine ID recovery path (2 tests) - F-D1-3 LOW: Dead example_integration.go code (2 tests) - F-D1-4 LOW: Windows redundant machineid.ID() retry (2 tests) - F-D1-5 LOW: client.go fmt.Printf for errors (2 tests) Current state: 6 FAIL, 9 PASS. All prior tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 09:41:25 -04:00
jpetree331	8530e6c6fc	docs: D-1 machine ID duplication audit Comprehensive audit of machine ID implementations across the agent codebase. Identified 3 production call sites with 1 critical divergence. Key findings: - F-D1-1 HIGH: Registration fallback "unknown-"+hostname is unhashed, mismatches runtime SHA256 hash, causes permanent agent lockout when GetMachineID() transiently fails then recovers - F-D1-2 MEDIUM: No recovery path from machine ID mismatch - F-D1-3 LOW: example_integration.go is dead code calling machineid.ID() directly (bypasses canonical hashing) - F-D1-4 LOW: Windows redundant machineid.ID() retry - F-D1-5 LOW: client.go uses fmt.Printf for machine ID error 6 findings total. See docs/D1_MachineID_Audit.md for details. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 09:34:04 -04:00
jpetree331	a1df7d7b05	refactor: C-series cleanup and TODO documentation - Delete install.sh.deprecated (dead code) - Add TODO(DEV-031) for ghost update scanner-side prevention - Add TODO(DEV-030) with specific missing service cycles - ETHOS sweep: zero banned words, emojis, or fmt.Printf - All tests pass, Linux builds clean Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 09:29:12 -04:00
jpetree331	1b2aa1bf63	verify: C-1 Windows bug fixes verified All 8 C-1 fixes verified. Linux builds clean on AMD64 and ARM64. 98 tests pass (19 scanner + 4 internal + 14 crypto + 3 circuit + 58 server), 1 skip (pre-existing). No regressions. DEV-030: polling loop parity (not deduplication) — gaps documented. DEV-031: ghost update fix is detection-only, not prevention. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 09:21:16 -04:00
jpetree331	8901f22a96	fix(windows): C-1 Windows-specific bug fixes - Apply B-2 jitter and backoff fixes to Windows service (F-C1-5) Proportional jitter and exponential backoff now in service polling loop - Add known winget install location search for SYSTEM account (F-C1-1) Checks PATH then system-wide WindowsApps locations - Fix winget text parser for package names with spaces (F-C1-2) Column-position parsing from header keywords replaces whitespace split - Add ghost update post-install state verification (F-C1-3) RebootRequired flag on InstallResult marks pending reboot - Replace fmt.Printf with log.Printf in winget scanner (F-C1-6) - Remove emoji from Windows service log messages (F-C1-7) GOOS=linux build: PASS. All tests pass, no regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 09:13:21 -04:00
jpetree331	38184a9625	test(windows): C-1 pre-fix tests for Windows-specific bugs Pre-fix test suite for 7 Windows-specific findings. All tests are SHARED (no build tags) — they compile and run on Linux using source file inspection and direct function calls. Tests added: - F-C1-1 HIGH: Winget PATH-only search (2 tests) - F-C1-2 MEDIUM: Winget text parser spaces bug (4 tests) - F-C1-3 HIGH: Ghost updates — no post-install verification (3 tests) - F-C1-4 RESOLVED: Service auto-restart already configured (1 test) - F-C1-5 HIGH: Duplicated polling loop missing B-2 fixes (5 tests) - F-C1-6 LOW: Winget uses fmt.Printf (2 tests) - F-C1-7 LOW: Service has emojis in logs (2 tests) Current state: 8 FAIL, 11 PASS. All prior tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 08:51:44 -04:00
jpetree331	799c155d94	docs: C-1 Windows-specific bugs audit Comprehensive audit of Windows agent code: winget detection, Windows Update ghost updates, service wrapper, HWID, and vendored windowsupdate package. Key findings: - F-C1-1 HIGH: Winget not found as SYSTEM (PATH-only search) - F-C1-3 HIGH: No post-install verification (ghost updates) - F-C1-5 HIGH: Windows service has duplicated polling loop missing B-2 fixes (jitter cap, exponential backoff) - F-C1-2 MEDIUM: Fragile winget text parser - F-C1-4 MEDIUM: No service auto-restart on crash 9 findings total. See docs/C1_Windows_Audit.md for details. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 08:22:18 -04:00
jpetree331	f71f878a35	fix(concurrency): wire retry_count increment for stuck command re-delivery (DEV-029) retry_count column and filter existed but counter was never incremented. Stuck commands always had retry_count=0 and always passed the WHERE retry_count < 5 filter, making the cap ineffective. Fix: Added RedeliverStuckCommandTx that sets retry_count = retry_count + 1 on stuck->sent re-delivery. GetCommands handler now uses MarkCommandSentTx for new commands (retry_count stays 0) and RedeliverStuckCommandTx for stuck re-delivery (retry_count increments). All 77 tests pass. DEV-029 resolved. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 08:16:12 -04:00
jpetree331	e93d850ab9	verify: B-2 data integrity verification — fixes verified with 1 follow-up All B-2 concurrency fixes verified: - Registration transaction: atomic, no orphaned agents - SELECT FOR UPDATE SKIP LOCKED: prevents duplicate delivery - Token renewal: atomic validate + update - GetCommands: rate limited with agent_checkin key - Jitter: capped at min(pollingInterval/2, 30s) - Exponential backoff: base=10s, cap=5min, reset on success Finding: DEV-029 — retry_count column exists but never incremented. Filter is in place but ineffective. Targeted fix needed. 77 tests pass. No regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 08:09:35 -04:00
jpetree331	3ca42d50f4	fix(concurrency): B-2 data integrity and race condition fixes - Wrap agent registration in DB transaction (F-B2-1/F-B2-8) All 4 ops atomic, manual DeleteAgent rollback removed - Use SELECT FOR UPDATE SKIP LOCKED for atomic command delivery (F-B2-2) Concurrent requests get different commands, no duplicates - Wrap token renewal in DB transaction (F-B2-9) Validate + update expiry atomic - Add rate limit to GET /agents/:id/commands (F-B2-4) agent_checkin rate limiter applied - Add retry_count column, cap stuck command retries at 5 (F-B2-10) Migration 029, GetStuckCommands filters retry_count < 5 - Cap polling jitter at current interval (fixes rapid mode) (F-B2-5) maxJitter = min(pollingInterval/2, 30s) - Add exponential backoff with full jitter on reconnection (F-B2-7) calculateBackoff: base=10s, cap=5min, reset on success All tests pass. No regressions from A-series or B-1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 08:00:36 -04:00
jpetree331	59ab7cbd5f	test(concurrency): B-2 pre-fix tests for data integrity and concurrency bugs Pre-fix test suite documenting 7 data integrity and concurrency bugs. Tests FAIL where they assert correct post-fix behavior, PASS where they document current buggy state. Tests added: - F-B2-1/8 HIGH: Registration not transactional (3 tests) - F-B2-2 MEDIUM: Command delivery race condition (3 tests) - F-B2-9 MEDIUM: Token renewal not transactional (2 tests) - F-B2-4 MEDIUM: No rate limit on GetCommands (3 tests) - F-B2-5 LOW: Jitter negates rapid mode (2 tests) - F-B2-10 LOW: No max retry for stuck commands (2 tests) - F-B2-7 MEDIUM: No exponential backoff on reconnection (2 tests) Current state: 7 FAIL, 10 PASS. No A/B-1 regressions. See docs/B2_PreFix_Tests.md for full inventory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 07:45:16 -04:00
jpetree331	2fd0fd27fa	docs: B-2 data integrity and concurrency audit Comprehensive audit of registration token races, command queue concurrency, rapid mode risks, agent staleness, transaction safety, and deadlock risks. Key findings: - F-B2-1 HIGH: Registration flow not transactional (4 separate ops) - F-B2-8 HIGH: Same as F-B2-1 (crash leaves orphaned agent) - F-B2-2 MEDIUM: Duplicate command delivery on concurrent requests - F-B2-4 MEDIUM: No cap on concurrent rapid-mode agents - F-B2-7 MEDIUM: No staggered reconnection after server restart - F-B2-9 MEDIUM: Token renewal not transactional (self-healing) 10 findings total. See docs/B2_Data_Integrity_Audit.md for details. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 07:26:02 -04:00
jpetree331	1f828b6f61	verify: B-1 schema integrity verification — all fixes verified - Migration sequence: 30 files, no duplicates, monotonically ordered - Migration 024: self-insert removed, bad column fixed, idempotent - Server aborts on migration failure (log.Fatalf) - Scanner config migration renumbered to 027 with correct suffix - All migrations idempotent (TestAllMigrationsAreIdempotent passes) - N+1 replaced with GetAllUpdateStats aggregate query - Stuck commands index (028) and background cleanup verified - 55 tests pass (41 server + 14 agent), zero regressions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 07:12:50 -04:00
jpetree331	ec0d880036	fix(database): B-1 schema integrity and migration fixes - Fix migration 024 self-insert and bad column reference (F-B1-1, F-B1-2) Uses existing enabled/auto_run columns instead of non-existent deprecated - Abort server on migration failure instead of warning (F-B1-11) main.go now calls log.Fatalf, prints [INFO] only on success - Fix migration 018 scanner_config filename suffix (F-B1-3) Renumbered to 027 with .up.sql suffix - Remove GRANT to non-existent role in scanner_config (F-B1-4) - Resolve duplicate migration numbers 009 and 012 (F-B1-13) Renamed to 009b and 012b for unique lexical sorting - Add IF NOT EXISTS to all non-idempotent migrations (F-B1-15) Fixed: 011, 012, 017, 023, 023a - Replace N+1 dashboard stats loop with GetAllUpdateStats (F-B1-6) Single aggregate query replaces per-agent loop - Add composite index on agent_commands(status, sent_at) (F-B1-5) New migration 028 with partial index for timeout service - Add background refresh token cleanup goroutine (F-B1-10) 24-hour ticker calls CleanupExpiredTokens - ETHOS log format in migration runner (no emojis) All 55 tests pass (41 server + 14 agent). No regressions. See docs/B1_Fix_Implementation.md and DEV-025 through DEV-028. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 07:03:35 -04:00
jpetree331	ab676c3b83	test(database): B-1 pre-fix tests for migration and schema bugs Pre-fix test suite documenting 9 database migration and schema integrity bugs. Tests FAIL where they assert correct post-fix behavior, PASS where they document current buggy state. Tests added: - F-B1-11 P0: main.go swallows migration errors (3 tests) - F-B1-13: Duplicate migration numbers 009/012 (2 tests) - F-B1-1: Migration 024 self-insert into schema_migrations (2 tests) - F-B1-2: Migration 024 references non-existent column (2 tests) - F-B1-3: Migration 018 wrong file suffix (2 tests) - F-B1-4: Migration 018 GRANT to wrong role (1 test) - F-B1-15: 7+ migrations not idempotent (2 tests) - F-B1-5: Missing agent_commands sent_at index (2 tests) - F-B1-6: N+1 query in GetDashboardStats (2 tests) - F-B1-10: No background refresh token cleanup (2 tests) Current state: 10 PASS, 10 FAIL, 0 SKIP. All A-series tests continue to pass (no regressions). See docs/B1_PreFix_Tests.md for full inventory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 06:42:19 -04:00
jpetree331	3de7577802	docs: B-1 database migration and schema integrity audit Comprehensive audit of the custom migration runner, all 26 migrations, query patterns, foreign keys, and schema state management. Critical findings: - F-B1-11 P0: Server starts with incomplete schema after migration failure, prints [OK] — main.go swallows migration errors - F-B1-1: Migration 024 self-inserts into schema_migrations - F-B1-2: Migration 024 references non-existent deprecated column - F-B1-3: Migration 018 scanner_config has wrong file extension - F-B1-6: N+1 query in GetDashboardStats (1 query per agent) 15 findings total across P0/CRITICAL/HIGH/MEDIUM/LOW. See docs/B1_Database_Audit.md for full analysis. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 06:28:59 -04:00
jpetree331	c2774342f3	verify: A-series refactor verification — all tests pass All pre-existing tests pass after dead code cleanup. No regressions from A-1, A-2, or A-3 fix rounds. 41 tests pass, 1 skip (pre-existing). Zero new failures. Ready to proceed to B-series database audit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 06:21:05 -04:00
jpetree331	3e1e2a78fd	refactor: A-series dead code cleanup and ETHOS compliance sweep - Remove dead queries.RetryCommand function (DEV-019, 31 lines) - Remove security_settings.go.broken leftover from A-3 - Remove 5 compiled test binaries from aggregator-agent/ (~61MB) - Remove config_builder.go.restored from repo root - Remove test_disk_detection.go and test_disk.go (throwaway test files) - Fix 6 banned word violations (production-ready, enhanced, robust, seamlessly) - Add .gitignore rules for compiled agent binaries - Document machine ID duplication for D-1 fix prompt - Document 30+ pre-existing emoji violations for D-2 pass No behavior changes. All 41 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 06:17:12 -04:00
jpetree331	6e62208f82	docs: A-3 verification report — all fixes verified All 9 auth middleware fixes confirmed correct: - F-A3-11: JWT secret leak removed, ETHOS log format - F-A3-7: Config download protected (WebAuthMiddleware) - F-A3-6: Update download protected (AuthMiddleware) - F-A3-10: Scheduler stats on WebAuthMiddleware - F-A3-13: RequireAdmin implemented, 7 routes re-enabled - F-A3-12: JWT issuer claims with backward compat grace period - F-A3-2: /auth/verify endpoint fixed - F-A3-9: Agent unregister rate-limited - F-A3-14: CORS origin configurable 41 tests pass (27 server + 14 agent). No regressions. Zero issues found during verification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 06:07:57 -04:00
jpetree331	4c62de8d8b	fix(security): A-3 auth middleware coverage fixes Fixes 9 auth middleware findings from the A-3 recon audit. F-A3-11 CRITICAL: Removed JWT secret from WebAuthMiddleware log output. Replaced emoji-prefixed fmt.Printf with ETHOS-compliant log.Printf. No secret values in any log output. F-A3-7 CRITICAL: Config download now requires WebAuthMiddleware. GET /downloads/config/:agent_id is admin-only (agents never call it). F-A3-6 HIGH: Update package download now requires AuthMiddleware. GET /downloads/updates/:package_id requires valid agent JWT. F-A3-10 HIGH: Scheduler stats changed from AuthMiddleware to WebAuthMiddleware. Agent JWTs can no longer view scheduler internals. F-A3-13 LOW: RequireAdmin() middleware implemented. 7 security settings routes re-enabled (GET/PUT/POST under /security/settings). security_settings.go.broken renamed to .go, API mismatches fixed. F-A3-12 MEDIUM: JWT issuer claims added for token type separation. Agent tokens: issuer=redflag-agent, Web tokens: issuer=redflag-web. AuthMiddleware rejects tokens with wrong issuer. Grace period: tokens with no issuer still accepted (backward compat). F-A3-2 MEDIUM: /auth/verify now has WebAuthMiddleware applied. Endpoint returns 200 with valid=true for valid admin tokens. F-A3-9 MEDIUM: Agent self-unregister (DELETE /:id) now rate-limited using the same agent_reports rate limiter as other agent routes. F-A3-14 LOW: CORS origin configurable via REDFLAG_CORS_ORIGIN env var. Defaults to http://localhost:3000 for development. Added PATCH method and agent-specific headers to CORS config. All 27 server tests pass. All 14 agent tests pass. No regressions. See docs/A3_Fix_Implementation.md and docs/Deviations_Report.md (DEV-020 through DEV-022). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 22:17:40 -04:00
jpetree331	ee246771dc	test(security): A-3 pre-fix tests for auth middleware coverage bugs Pre-fix test suite documenting 8 auth middleware bugs found during the A-3 recon audit. Tests are written to FAIL where they assert correct post-fix behavior, and PASS where they document current buggy behavior. No bugs are fixed in this commit. Tests added: - F-A3-11 CRITICAL: WebAuthMiddleware leaks JWT secret to stdout (3 tests: secret in output, emoji in output, ETHOS format) - F-A3-7 CRITICAL: Config download requires no auth (2 tests) - F-A3-6 HIGH: Update package download requires no auth (2 tests) - F-A3-10 HIGH: Scheduler stats accepts agent JWT (2 tests) - F-A3-12 MEDIUM: Cross-type JWT token confusion (2 tests) - F-A3-2 MEDIUM: /auth/verify dead endpoint (2 tests) - F-A3-13 LOW: RequireAdmin middleware missing (1 test + 1 build-tagged) - F-A3-9 MEDIUM: Agent self-unregister no rate limit (2 tests) Current state: 10 FAIL, 7 PASS, 1 SKIP (build-tagged), 1 unchanged See docs/A3_PreFix_Tests.md for full inventory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 21:54:48 -04:00
jpetree331	f97d4845af	feat(security): A-1 Ed25519 key rotation + A-2 replay attack fixes Complete RedFlag codebase with two major security audit implementations. == A-1: Ed25519 Key Rotation Support == Server: - SignCommand sets SignedAt timestamp and KeyID on every signature - signing_keys database table (migration 020) for multi-key rotation - InitializePrimaryKey registers active key at startup - /api/v1/public-keys endpoint for rotation-aware agents - SigningKeyQueries for key lifecycle management Agent: - Key-ID-aware verification via CheckKeyRotation - FetchAndCacheAllActiveKeys for rotation pre-caching - Cache metadata with TTL and staleness fallback - SecurityLogger events for key rotation and command signing == A-2: Replay Attack Fixes (F-1 through F-7) == F-5 CRITICAL - RetryCommand now signs via signAndCreateCommand F-1 HIGH - v3 format: "{agent_id}:{cmd_id}:{type}:{hash}:{ts}" F-7 HIGH - Migration 026: expires_at column with partial index F-6 HIGH - GetPendingCommands/GetStuckCommands filter by expires_at F-2 HIGH - Agent-side executedIDs dedup map with cleanup F-4 HIGH - commandMaxAge reduced from 24h to 4h F-3 CRITICAL - Old-format commands rejected after 48h via CreatedAt Verification fixes: migration idempotency (ETHOS #4), log format compliance (ETHOS #1), stale comments updated. All 24 tests passing. Docker --no-cache build verified. See docs/ for full audit reports and deviation log (DEV-001 to DEV-019). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 21:25:47 -04:00

27 Commits