13 KiB
RedFlag v0.1.26.0 - Technical Issues and Technical Debt Audit
Document Version: 1.0 Date: 2025-12-19 Scope: Post-Issue#3 Implementation Audit Status: ACTIVE ISSUES requiring immediate resolution
Executive Summary
During the implementation of Issue #3 (subsystem tracking) and the command recovery fix, we identified critical architectural issues that violate ETHOS principles and create user-facing bugs. This document catalogs all issues, their root causes, and required fixes.
Issues by Severity:
- 🔴 CRITICAL: 3 issues (user-facing bugs, data corruption risk)
- 🟡 HIGH: 4 issues (technical debt, maintenance burden)
- 🟢 MEDIUM: 2 issues (code quality, naming violations)
🔴 CRITICAL ISSUES (User-Facing)
1. Storage Scans Appearing as Package Updates
Severity: 🔴 CRITICAL User Impact: HIGH ETHOS Violations: #1 (Errors are History - data in wrong place), #5 (No BS - misleading UI)
Problem: Storage scan results (handleScanStorage) are appearing on the Updates page alongside package updates. Users see disk usage metrics (partition sizes, mount points) mixed with apt/dnf package updates.
Root Cause: handleScanStorage in aggregator-agent/cmd/agent/subsystem_handlers.go calls ReportLog() which stores entries in update_logs table, the same table used for package updates.
Location:
- Agent:
aggregator-agent/cmd/agent/subsystem_handlers.go:119-123
// Report the scan log (WRONG - this goes to update_logs table)
if err := reportLogWithAck(apiClient, cfg, ackTracker, logReport); err != nil {
log.Printf("Failed to report scan log: %v\n", err)
}
Correct Behavior: Storage scans should ONLY report to /api/v1/agents/:id/storage-metrics endpoint, which stores in dedicated storage_metrics table.
Fix Required:
- Comment out/remove the
ReportLogcall inhandleScanStorage(lines 119-123) - Verify
ReportStorageMetricscall (lines 162-164) is working - Register missing route for GET
/api/v1/agents/:id/storage-metricsif not already registered
Verification Steps:
- Trigger storage scan from UI
- Verify NO new entries appear on Updates page
- Verify data appears on Storage page
- Check
storage_metricstable has new rows
2. System Scans Appearing as Package Updates
Severity: 🔴 CRITICAL User Impact: HIGH ETHOS Violations: #1, #5
Problem: System scan results (CPU, memory, processes, uptime) are appearing on Updates page as LOW severity package updates.
User Report: "On the Updates tab, the top 6-7 'updates' are system specs, not system packages. They are HD details or processes, or partition sizes."
Root Cause: handleScanSystem also calls ReportLog() storing in update_logs table.
Location:
- Agent:
aggregator-agent/cmd/agent/subsystem_handlers.go:207-211
// Report the scan log (WRONG - this goes to update_logs table)
if err := reportLogWithAck(apiClient, cfg, ackTracker, logReport); err != nil {
log.Printf("Failed to report scan log: %v\n", err)
}
Correct Behavior: System scans should ONLY report to /api/v1/agents/:id/metrics endpoint.
Fix Required:
- Comment out/remove the
ReportLogcall inhandleScanSystem(lines 207-211) - Verify
ReportMetricscall is working - Register missing route for GET endpoint if needed
3. Duplicate "Scan All" Entries in History
Severity: 🔴 CRITICAL User Impact: MEDIUM ETHOS Violations: #1 (duplicate history entries), #4 (not idempotent)
Problem: When triggering a full system scan (handleScanUpdatesV2), users see TWO entries:
- One generic "scan updates" collective entry
- Plus individual entries for each subsystem
Root Cause: handleScanUpdatesV2 creates a collective log (lines 44-57) while orchestrator also logs individual scan results via individual handlers.
Location:
- Agent:
aggregator-agent/cmd/agent/subsystem_handlers.go:44-63
// Create scan log entry with subsystem metadata (COLLECTIVE)
logReport := client.LogReport{
CommandID: commandID,
Action: "scan_updates",
Result: map[bool]string{true: "success", false: "failure"}[exitCode == 0],
// ...
}
// Report the scan log
if err := reportLogWithAck(apiClient, cfg, ackTracker, logReport); err != nil {
log.Printf("Failed to report scan log: %v\n", err)
}
Fix Required:
- Comment out lines 44-63 (remove collective logging from handleScanUpdatesV2)
- Keep individual subsystem logging (lines 60, 121, 209, 291)
Verification: After fix, only individual subsystem entries should appear (scan_docker, scan_storage, scan_system, etc.)
🟡 HIGH PRIORITY ISSUES (Technical Debt)
4. Missing Route Registration for Storage Metrics Endpoint
Severity: 🟡 HIGH Impact: Storage page empty ETHOS Violations: #3 (Assume Failure), #4 (Idempotency - retry won't work without route)
Problem: Backend has handler functions but routes are not registered. Agent cannot report storage metrics.
Location:
- Handler exists:
aggregator-server/internal/api/handlers/storage_metrics.go:26,75 - Missing: Route registration in router setup
Handlers Without Routes:
// Exists but not wired to HTTP routes:
func (h *StorageMetricsHandler) ReportStorageMetrics(c *gin.Context) // POST
func (h *StorageMetricsHandler) GetStorageMetrics(c *gin.Context) // GET
Fix Required:
Find route registration file (likely cmd/server/main.go or internal/api/server.go) and add:
agentGroup := router.Group("/api/v1/agents", middleware...)
agentGroup.POST("/:id/storage-metrics", storageMetricsHandler.ReportStorageMetrics)
agentGroup.GET("/:id/storage-metrics", storageMetricsHandler.GetStorageMetrics)
5. Route Registration for Metrics Endpoint
Severity: 🟡 HIGH Impact: System page potentially empty
Problem: Similar to #4, /api/v1/agents/:id/metrics endpoint may not be registered.
Location: Need to verify routes exist for system metrics reporting.
6. Database Migration Not Applied
Severity: 🟡 HIGH Impact: Subsystem column doesn't exist, subsystem queries will fail
Problem: Migration 022_add_subsystem_to_logs.up.sql created but not run. Server code references subsystem column which doesn't exist.
Files:
- Created:
aggregator-server/internal/database/migrations/022_add_subsystem_to_logs.up.sql - Referenced:
aggregator-server/internal/models/update.go:61 - Referenced:
aggregator-server/internal/api/handlers/updates.go:226-230
Verification:
\d update_logs
-- Should show: subsystem | varchar(50) |
Fix Required:
cd aggregator-server
go run cmd/server/main.go -migrate
🟢 MEDIUM PRIORITY ISSUES (Code Quality)
7. Frontend File Duplication - Marketing Fluff Naming
Severity: 🟢 MEDIUM ETHOS Violations: #5 (No Marketing Fluff - "enhanced" is banned), Technical Debt
Problem: Duplicate files with marketing fluff naming.
Files:
aggregator-web/src/components/AgentUpdates.tsx(236 lines - old/simple version)aggregator-web/src/components/AgentUpdatesEnhanced.tsx(567 lines - current version)aggregator-web/src/components/AgentUpdate.tsx(Agent binary updater - legitimate)
ETHOS Violation: From ETHOS.md line 67: Banned Words: enhanced, enterprise-ready, seamless, robust, production-ready, revolutionary, etc.
Quote from ETHOS:
"We are building an 'honest' tool for technical users, not pitching a product. Fluff hides meaning and creates enterprise BS."
Fix Required:
# Remove old duplicate
cd aggregator-web/src/components
rm AgentUpdates.tsx
# Rename to remove marketing fluff
mv AgentUpdatesEnhanced.tsx AgentUpdates.tsx
# Search and replace all imports
grep -r "AgentUpdatesEnhanced" src/ --include="*.tsx" --include="*.ts"
# Replace with "AgentUpdates"
Verification: Application builds and runs with renamed component.
8. Backend V2 Naming Pattern - Poor Refactoring
Severity: 🟢 MEDIUM ETHOS Violations: #5 (No BS), Technical Debt
Problem: handleScanUpdatesV2 suggests V1 exists or poor refactoring.
Location: aggregator-agent/cmd/agent/subsystem_handlers.go:28
Historical Context: Likely created during orchestrator refactor. Old version should have been removed/replaced, not versioned.
Quote from ETHOS (line 59-60):
"Never use banned words or emojis in logs or code. We are building an 'honest' tool..."
Fix Required:
- Check if
handleScanUpdates(V1) exists anywhere - If V1 doesn't exist: rename
handleScanUpdatesV2tohandleScanUpdates - Update all references in command routing
Original Issues (Already Fixed)
✅ Command Status Bug (Priority 1 - FIXED)
File: aggregator-server/internal/api/handlers/agents.go:446
Problem: MarkCommandSent() error was not checked. Commands returned to agent but stayed in 'pending' status, causing infinite re-delivery.
Fix Applied:
- Added
GetStuckCommands()query to recover stuck commands - Modified check-in handler to recover commands older than 5 minutes
- Added proper error handling with [HISTORY] logging
- Changed source from "web_ui" to "manual" to match DB constraint
Verification: Build successful, ready for testing
✅ Issue #3 - Subsystem Tracking (Priority 2 - IMPLEMENTED)
Status: Backend implementation complete, pending database migration
Files Modified:
- Migration created:
022_add_subsystem_to_logs.up.sql - Models updated:
UpdateLogandUpdateLogRequestwithSubsystemfield - Backend handlers updated to extract subsystem from action
- Agent client updated to send subsystem from metadata
- Query functions added:
GetLogsByAgentAndSubsystem(),GetSubsystemStats()
Pending:
- Run database migration
- Verify frontend receives subsystem data
- Test all 7 subsystems independently
Complete Fix Sequence
Phase 1: Critical User-Facing Bugs (MUST DO NOW)
- ✅ Fix #1: Comment out ReportLog in handleScanStorage (lines 119-123)
- ✅ Fix #2: Comment out ReportLog in handleScanSystem (lines 207-211)
- ✅ Fix #3: Comment out collective logging in handleScanUpdatesV2 (lines 44-63)
- ✅ Fix #4: Register storage-metrics routes
- ✅ Fix #5: Register metrics routes
Phase 2: Database & Technical Debt
- ✅ Fix #6: Run migration 022_add_subsystem_to_logs
- ✅ Fix #7: Remove AgentUpdates.tsx, rename AgentUpdatesEnhanced.tsx
- ✅ Fix #8: Remove V2 suffix from handleScanUpdates (if no V1 exists)
Phase 3: Verification
- Test storage scan - should appear ONLY on Storage page
- Test system scan - should appear ONLY on System page
- Test full scan - should show individual subsystem entries only
- Verify history shows proper subsystem names
ETHOS Compliance Checklist
For each fix, we must verify:
- ETHOS #1: All errors logged with context, no
/dev/null - ETHOS #2: No new unauthenticated endpoints
- ETHOS #3: Fallback paths exist (retry logic, circuit breakers)
- ETHOS #4: Idempotency verified (run 3x safely)
- ETHOS #5: No marketing fluff (no "enhanced", "robust", etc.)
- Pre-Integration: History logging added, security review, tests
Files to Delete/Rename
Delete These Files:
aggregator-web/src/components/AgentUpdates.tsx(236 lines, old version)
Rename These Files:
aggregator-agent/cmd/agent/subsystem_handlers.go:28- renamehandleScanUpdatesV2→handleScanUpdatesaggregator-web/src/components/AgentUpdatesEnhanced.tsx→AgentUpdates.tsx
Lines to Comment Out:
aggregator-agent/cmd/agent/subsystem_handlers.go:44-63(collective logging)aggregator-agent/cmd/agent/subsystem_handlers.go:119-123(ReportLog in storage)aggregator-agent/cmd/agent/subsystem_handlers.go:207-211(ReportLog in system)
Routes to Add:
- POST
/api/v1/agents/:id/storage-metrics - GET
/api/v1/agents/:id/storage-metrics - Verify GET
/api/v1/agents/:id/metricsexists
Session Documentation Requirements
As per ETHOS.md: Every session must identify and document:
-
New Technical Debt:
- Route registration missing (assumed but not implemented)
- Duplicate frontend files (poor refactoring)
- V2 naming pattern (poor version control)
-
Deferred Features:
- Frontend subsystem icons and display names
- Comprehensive testing of all 7 subsystems
-
Known Issues:
- Database migration not applied in test environment
- Storage/System pages empty due to missing routes
-
Architecture Decisions:
- Decision to keep both collective and individual scan patterns
- Justification: Different user intents (full audit vs single check)
Conclusion
Total Issues: 8 (3 critical, 4 high, 1 medium) Fixes Required: 8 code changes, 3 deletions, 2 renames Estimated Time: 2-3 hours for all fixes and verification Status: Ready for implementation
Next Action: Implement Phase 1 fixes (critical user-facing bugs) immediately.
Document Maintained By: Development Team Last Updated: 2025-12-19 Session: Issue #3 Implementation & Command Recovery Fix