Files

Fimeg 484a7f77ce Add docs and project files - force for Culurien

2026-03-28 20:46:24 -04:00

13 KiB

Raw Blame History

RedFlag v0.1.26.0 - Technical Issues and Technical Debt Audit

Document Version: 1.0 Date: 2025-12-19 Scope: Post-Issue#3 Implementation Audit Status: ACTIVE ISSUES requiring immediate resolution

Executive Summary

During the implementation of Issue #3 (subsystem tracking) and the command recovery fix, we identified critical architectural issues that violate ETHOS principles and create user-facing bugs. This document catalogs all issues, their root causes, and required fixes.

Issues by Severity:

🔴 CRITICAL: 3 issues (user-facing bugs, data corruption risk)
🟡 HIGH: 4 issues (technical debt, maintenance burden)
🟢 MEDIUM: 2 issues (code quality, naming violations)

🔴 CRITICAL ISSUES (User-Facing)

1. Storage Scans Appearing as Package Updates

Severity: 🔴 CRITICAL User Impact: HIGH ETHOS Violations: #1 (Errors are History - data in wrong place), #5 (No BS - misleading UI)

Problem: Storage scan results (handleScanStorage) are appearing on the Updates page alongside package updates. Users see disk usage metrics (partition sizes, mount points) mixed with apt/dnf package updates.

Root Cause: handleScanStorage in aggregator-agent/cmd/agent/subsystem_handlers.go calls ReportLog() which stores entries in update_logs table, the same table used for package updates.

Location:

Agent: aggregator-agent/cmd/agent/subsystem_handlers.go:119-123

// Report the scan log (WRONG - this goes to update_logs table)
if err := reportLogWithAck(apiClient, cfg, ackTracker, logReport); err != nil {
    log.Printf("Failed to report scan log: %v\n", err)
}

Correct Behavior: Storage scans should ONLY report to /api/v1/agents/:id/storage-metrics endpoint, which stores in dedicated storage_metrics table.

Fix Required:

Comment out/remove the ReportLog call in handleScanStorage (lines 119-123)
Verify ReportStorageMetrics call (lines 162-164) is working
Register missing route for GET /api/v1/agents/:id/storage-metrics if not already registered

Verification Steps:

Trigger storage scan from UI
Verify NO new entries appear on Updates page
Verify data appears on Storage page
Check storage_metrics table has new rows

2. System Scans Appearing as Package Updates

Severity: 🔴 CRITICAL User Impact: HIGH ETHOS Violations: #1, #5

Problem: System scan results (CPU, memory, processes, uptime) are appearing on Updates page as LOW severity package updates.

User Report: "On the Updates tab, the top 6-7 'updates' are system specs, not system packages. They are HD details or processes, or partition sizes."

Root Cause: handleScanSystem also calls ReportLog() storing in update_logs table.

Location:

Agent: aggregator-agent/cmd/agent/subsystem_handlers.go:207-211

// Report the scan log (WRONG - this goes to update_logs table)
if err := reportLogWithAck(apiClient, cfg, ackTracker, logReport); err != nil {
    log.Printf("Failed to report scan log: %v\n", err)
}

Correct Behavior: System scans should ONLY report to /api/v1/agents/:id/metrics endpoint.

Fix Required:

Comment out/remove the ReportLog call in handleScanSystem (lines 207-211)
Verify ReportMetrics call is working
Register missing route for GET endpoint if needed

3. Duplicate "Scan All" Entries in History

Severity: 🔴 CRITICAL User Impact: MEDIUM ETHOS Violations: #1 (duplicate history entries), #4 (not idempotent)

Problem: When triggering a full system scan (handleScanUpdatesV2), users see TWO entries:

One generic "scan updates" collective entry
Plus individual entries for each subsystem

Root Cause: handleScanUpdatesV2 creates a collective log (lines 44-57) while orchestrator also logs individual scan results via individual handlers.

Location:

Agent: aggregator-agent/cmd/agent/subsystem_handlers.go:44-63

// Create scan log entry with subsystem metadata (COLLECTIVE)
logReport := client.LogReport{
    CommandID:       commandID,
    Action:          "scan_updates",
    Result:          map[bool]string{true: "success", false: "failure"}[exitCode == 0],
    // ...
}
// Report the scan log
if err := reportLogWithAck(apiClient, cfg, ackTracker, logReport); err != nil {
    log.Printf("Failed to report scan log: %v\n", err)
}

Fix Required:

Comment out lines 44-63 (remove collective logging from handleScanUpdatesV2)
Keep individual subsystem logging (lines 60, 121, 209, 291)

Verification: After fix, only individual subsystem entries should appear (scan_docker, scan_storage, scan_system, etc.)

🟡 HIGH PRIORITY ISSUES (Technical Debt)

4. Missing Route Registration for Storage Metrics Endpoint

Severity: 🟡 HIGH Impact: Storage page empty ETHOS Violations: #3 (Assume Failure), #4 (Idempotency - retry won't work without route)

Problem: Backend has handler functions but routes are not registered. Agent cannot report storage metrics.

Location:

Handler exists: aggregator-server/internal/api/handlers/storage_metrics.go:26,75
Missing: Route registration in router setup

Handlers Without Routes:

// Exists but not wired to HTTP routes:
func (h *StorageMetricsHandler) ReportStorageMetrics(c *gin.Context) // POST
func (h *StorageMetricsHandler) GetStorageMetrics(c *gin.Context)   // GET

Fix Required: Find route registration file (likely cmd/server/main.go or internal/api/server.go) and add:

agentGroup := router.Group("/api/v1/agents", middleware...)
agentGroup.POST("/:id/storage-metrics", storageMetricsHandler.ReportStorageMetrics)
agentGroup.GET("/:id/storage-metrics", storageMetricsHandler.GetStorageMetrics)

5. Route Registration for Metrics Endpoint

Severity: 🟡 HIGH Impact: System page potentially empty

Problem: Similar to #4, /api/v1/agents/:id/metrics endpoint may not be registered.

Location: Need to verify routes exist for system metrics reporting.

6. Database Migration Not Applied

Severity: 🟡 HIGH Impact: Subsystem column doesn't exist, subsystem queries will fail

Problem: Migration 022_add_subsystem_to_logs.up.sql created but not run. Server code references subsystem column which doesn't exist.

Files:

Created: aggregator-server/internal/database/migrations/022_add_subsystem_to_logs.up.sql
Referenced: aggregator-server/internal/models/update.go:61
Referenced: aggregator-server/internal/api/handlers/updates.go:226-230

Verification:

\d update_logs
-- Should show: subsystem | varchar(50) |

Fix Required:

cd aggregator-server
go run cmd/server/main.go -migrate

🟢 MEDIUM PRIORITY ISSUES (Code Quality)

7. Frontend File Duplication - Marketing Fluff Naming

Severity: 🟢 MEDIUM ETHOS Violations: #5 (No Marketing Fluff - "enhanced" is banned), Technical Debt

Problem: Duplicate files with marketing fluff naming.

Files:

aggregator-web/src/components/AgentUpdates.tsx (236 lines - old/simple version)
aggregator-web/src/components/AgentUpdatesEnhanced.tsx (567 lines - current version)
aggregator-web/src/components/AgentUpdate.tsx (Agent binary updater - legitimate)

ETHOS Violation: From ETHOS.md line 67: Banned Words: enhanced, enterprise-ready, seamless, robust, production-ready, revolutionary, etc.

Quote from ETHOS:

"We are building an 'honest' tool for technical users, not pitching a product. Fluff hides meaning and creates enterprise BS."

Fix Required:

# Remove old duplicate
cd aggregator-web/src/components
rm AgentUpdates.tsx

# Rename to remove marketing fluff
mv AgentUpdatesEnhanced.tsx AgentUpdates.tsx

# Search and replace all imports
grep -r "AgentUpdatesEnhanced" src/ --include="*.tsx" --include="*.ts"
# Replace with "AgentUpdates"

Verification: Application builds and runs with renamed component.

8. Backend V2 Naming Pattern - Poor Refactoring

Severity: 🟢 MEDIUM ETHOS Violations: #5 (No BS), Technical Debt

Problem: handleScanUpdatesV2 suggests V1 exists or poor refactoring.

Location: aggregator-agent/cmd/agent/subsystem_handlers.go:28

Historical Context: Likely created during orchestrator refactor. Old version should have been removed/replaced, not versioned.

Quote from ETHOS (line 59-60):

"Never use banned words or emojis in logs or code. We are building an 'honest' tool..."

Fix Required:

Check if handleScanUpdates (V1) exists anywhere
If V1 doesn't exist: rename handleScanUpdatesV2 to handleScanUpdates
Update all references in command routing

Original Issues (Already Fixed)

✅ Command Status Bug (Priority 1 - FIXED)

File: aggregator-server/internal/api/handlers/agents.go:446

Problem: MarkCommandSent() error was not checked. Commands returned to agent but stayed in 'pending' status, causing infinite re-delivery.

Fix Applied:

Added GetStuckCommands() query to recover stuck commands
Modified check-in handler to recover commands older than 5 minutes
Added proper error handling with [HISTORY] logging
Changed source from "web_ui" to "manual" to match DB constraint

Verification: Build successful, ready for testing

✅ Issue #3 - Subsystem Tracking (Priority 2 - IMPLEMENTED)

Status: Backend implementation complete, pending database migration

Files Modified:

Migration created: 022_add_subsystem_to_logs.up.sql
Models updated: UpdateLog and UpdateLogRequest with Subsystem field
Backend handlers updated to extract subsystem from action
Agent client updated to send subsystem from metadata
Query functions added: GetLogsByAgentAndSubsystem(), GetSubsystemStats()

Pending:

Run database migration
Verify frontend receives subsystem data
Test all 7 subsystems independently

Complete Fix Sequence

Phase 1: Critical User-Facing Bugs (MUST DO NOW)

✅ Fix #1: Comment out ReportLog in handleScanStorage (lines 119-123)
✅ Fix #2: Comment out ReportLog in handleScanSystem (lines 207-211)
✅ Fix #3: Comment out collective logging in handleScanUpdatesV2 (lines 44-63)
✅ Fix #4: Register storage-metrics routes
✅ Fix #5: Register metrics routes

Phase 2: Database & Technical Debt

✅ Fix #6: Run migration 022_add_subsystem_to_logs
✅ Fix #7: Remove AgentUpdates.tsx, rename AgentUpdatesEnhanced.tsx
✅ Fix #8: Remove V2 suffix from handleScanUpdates (if no V1 exists)

Phase 3: Verification

Test storage scan - should appear ONLY on Storage page
Test system scan - should appear ONLY on System page
Test full scan - should show individual subsystem entries only
Verify history shows proper subsystem names

ETHOS Compliance Checklist

For each fix, we must verify:

ETHOS #1: All errors logged with context, no /dev/null
ETHOS #2: No new unauthenticated endpoints
ETHOS #3: Fallback paths exist (retry logic, circuit breakers)
ETHOS #4: Idempotency verified (run 3x safely)
ETHOS #5: No marketing fluff (no "enhanced", "robust", etc.)
Pre-Integration: History logging added, security review, tests

Files to Delete/Rename

Delete These Files:

aggregator-web/src/components/AgentUpdates.tsx (236 lines, old version)

Rename These Files:

aggregator-agent/cmd/agent/subsystem_handlers.go:28 - rename handleScanUpdatesV2 → handleScanUpdates
aggregator-web/src/components/AgentUpdatesEnhanced.tsx → AgentUpdates.tsx

Lines to Comment Out:

aggregator-agent/cmd/agent/subsystem_handlers.go:44-63 (collective logging)
aggregator-agent/cmd/agent/subsystem_handlers.go:119-123 (ReportLog in storage)
aggregator-agent/cmd/agent/subsystem_handlers.go:207-211 (ReportLog in system)

Routes to Add:

POST /api/v1/agents/:id/storage-metrics
GET /api/v1/agents/:id/storage-metrics
Verify GET /api/v1/agents/:id/metrics exists

Session Documentation Requirements

As per ETHOS.md: Every session must identify and document:

New Technical Debt:
- Route registration missing (assumed but not implemented)
- Duplicate frontend files (poor refactoring)
- V2 naming pattern (poor version control)
Deferred Features:
- Frontend subsystem icons and display names
- Comprehensive testing of all 7 subsystems
Known Issues:
- Database migration not applied in test environment
- Storage/System pages empty due to missing routes
Architecture Decisions:
- Decision to keep both collective and individual scan patterns
- Justification: Different user intents (full audit vs single check)

Conclusion

Total Issues: 8 (3 critical, 4 high, 1 medium) Fixes Required: 8 code changes, 3 deletions, 2 renames Estimated Time: 2-3 hours for all fixes and verification Status: Ready for implementation

Next Action: Implement Phase 1 fixes (critical user-facing bugs) immediately.

Document Maintained By: Development Team Last Updated: 2025-12-19 Session: Issue #3 Implementation & Command Recovery Fix

13 KiB Raw Blame History

RedFlag v0.1.26.0 - Technical Issues and Technical Debt Audit

Executive Summary

🔴 CRITICAL ISSUES (User-Facing)

1. Storage Scans Appearing as Package Updates

2. System Scans Appearing as Package Updates

3. Duplicate "Scan All" Entries in History

🟡 HIGH PRIORITY ISSUES (Technical Debt)

4. Missing Route Registration for Storage Metrics Endpoint

5. Route Registration for Metrics Endpoint

6. Database Migration Not Applied

🟢 MEDIUM PRIORITY ISSUES (Code Quality)

7. Frontend File Duplication - Marketing Fluff Naming

8. Backend V2 Naming Pattern - Poor Refactoring

Original Issues (Already Fixed)

✅ Command Status Bug (Priority 1 - FIXED)

✅ Issue #3 - Subsystem Tracking (Priority 2 - IMPLEMENTED)

Complete Fix Sequence

Phase 1: Critical User-Facing Bugs (MUST DO NOW)

Phase 2: Database & Technical Debt

Phase 3: Verification

ETHOS Compliance Checklist

Files to Delete/Rename

Delete These Files:

Rename These Files:

Lines to Comment Out:

Routes to Add:

Session Documentation Requirements

Conclusion

13 KiB

Raw Blame History