# Clean Architecture: Command ID & Frontend Error Logging **Date**: 2025-12-19 **Status**: CLEAN ARCHITECTURE DESIGN (ETHOS Compliant) --- ## Problem Statement RedFlag has two critical issues violating ETHOS principles: 1. **Command ID Generation Failure**: Server fails to generate unique IDs for commands, causing `pq: duplicate key value violates unique constraint "agent_commands_pkey"` when users trigger multiple scans rapidly 2. **Frontend Errors Lost**: UI failures show toasts but are never persisted, violating ETHOS #1: "Errors are History, Not /dev/null" --- ## ETHOS Compliance Requirements **ETHOS #1**: All errors must be captured, logged with context, stored in history table - NEVER to /dev/null **ETHOS #2**: No unauthenticated endpoints - all routes protected by established security stack **ETHOS #3**: Assume failure - implement retry logic with exponential backoff for network operations **ETHOS #4**: Idempotency - system must handle duplicate operations gracefully **ETHOS #5**: No marketing fluff - clear, honest naming using technical terms --- ## Clean Architecture Design ### Phase 1: Command ID Generation (Server-Side) #### Problem Commands are created without IDs, causing PostgreSQL to receive zero UUIDs (00000000-0000-0000-0000-000000000000), resulting in primary key violations on subsequent inserts. #### Solution: Command Factory Pattern ```go // File: aggregator-server/internal/command/factory.go package command import ( "errors" "fmt" "github.com/Fimeg/RedFlag/aggregator-server/internal/models" "github.com/google/uuid" ) // Factory creates validated AgentCommand instances type Factory struct{} // NewFactory creates a new command factory func NewFactory() *Factory { return &Factory{} } // Create generates a new validated AgentCommand func (f *Factory) Create(agentID uuid.UUID, commandType string, params map[string]interface{}) (*models.AgentCommand, error) { cmd := &models.AgentCommand{ ID: uuid.New(), // Generation happens immediately and explicitly AgentID: agentID, CommandType: commandType, Status: "pending", Source: "manual", Params: params, } if err := cmd.Validate(); err != nil { return nil, fmt.Errorf("command validation failed: %w", err) } return cmd, nil } ``` Add validation to AgentCommand model: ```go // File: aggregator-server/internal/models/command.go // Validate checks if the command is valid func (c *AgentCommand) Validate() error { if c.ID == uuid.Nil { return errors.New("command ID cannot be zero UUID") } if c.AgentID == uuid.Nil { return errors.New("agent ID required") } if c.CommandType == "" { return errors.New("command type required") } if c.Status == "" { return errors.New("status required") } if c.Source != "manual" && c.Source != "system" { return errors.New("source must be 'manual' or 'system'") } return nil } ``` **Rationale**: Factory pattern ensures IDs are always generated at creation time, making it impossible to create invalid commands. Fail-fast validation catches issues immediately. **Impact**: Fixes the immediate duplicate key error and prevents similar bugs in all future command creation. --- ### Phase 2: Frontend Error Logging (UI to Server) #### Problem Frontend shows errors via toast notifications but never persists them. When users report "the button didn't work," we have no record of what failed, when, or why. **ETHOS #1 Violation**: Errors that exist only in browser memory are equivalent to /dev/null #### Solution: Client Error Logging System ##### Step 2.1: Database Schema ```sql -- File: aggregator-server/internal/database/migrations/023_client_error_logging.up.sql -- Purpose: Store frontend errors for debugging and auditing CREATE TABLE client_errors ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), agent_id UUID REFERENCES agents(id) ON DELETE SET NULL, subsystem VARCHAR(50) NOT NULL, error_type VARCHAR(50) NOT NULL, -- 'javascript_error', 'api_error', 'ui_error', 'validation_error' message TEXT NOT NULL, stack_trace TEXT, metadata JSONB, url TEXT NOT NULL, created_at TIMESTAMP DEFAULT NOW() ); -- Indexes for common query patterns CREATE INDEX idx_client_errors_agent_time ON client_errors(agent_id, created_at DESC); CREATE INDEX idx_client_errors_subsystem_time ON client_errors(subsystem, created_at DESC); CREATE INDEX idx_client_errors_type_time ON client_errors(error_type, created_at DESC); -- Comments for documentation COMMENT ON TABLE client_errors IS 'Frontend error logs for debugging and auditing'; COMMENT ON COLUMN client_errors.agent_id IS 'Agent that was active when error occurred (NULL for pre-auth errors)'; COMMENT ON COLUMN client_errors.subsystem IS 'Which RedFlag subsystem was being used'; COMMENT ON COLUMN client_errors.error_type IS 'Category of error for filtering'; COMMENT ON COLUMN client_errors.metadata IS 'Additional context (component name, API response, user actions)'; ``` **Rationale**: Proper schema with indexes allows efficient querying. References agents table to correlate errors with specific agents. Stores rich context for debugging. --- ##### Step 2.2: Backend Handler ```go // File: aggregator-server/internal/api/handlers/client_errors.go package handlers import ( "database/sql" "fmt" "log" "net/http" "time" "github.com/gin-gonic/gin" "github.com/jmoiron/sqlx" ) // ClientErrorHandler handles frontend error logging type ClientErrorHandler struct { db *sqlx.DB } // NewClientErrorHandler creates a new error handler func NewClientErrorHandler(db *sqlx.DB) *ClientErrorHandler { return &ClientErrorHandler{db: db} } // LogError processes and stores frontend errors func (h *ClientErrorHandler) LogError(c *gin.Context) { // Extract agent ID from auth middleware if available var agentID interface{} if agentIDValue, exists := c.Get("agentID"); exists { agentID = agentIDValue } var req struct { Subsystem string `json:"subsystem" binding:"required"` ErrorType string `json:"error_type" binding:"required,oneof=javascript_error api_error ui_error validation_error"` Message string `json:"message" binding:"required"` StackTrace string `json:"stack_trace,omitempty"` Metadata map[string]interface{} `json:"metadata,omitempty"` URL string `json:"url" binding:"required"` } if err := c.ShouldBindJSON(&req); err != nil { log.Printf("[ERROR] [server] [client_error] validation_failed error=\"%v\"", err) c.JSON(http.StatusBadRequest, gin.H{"error": "invalid request data"}) return } // Log to console with HISTORY prefix for unified logging log.Printf("[ERROR] [server] [client] [%s] agent_id=%v subsystem=%s message=\"%s\"", req.ErrorType, agentID, req.Subsystem, req.Message) log.Printf("[HISTORY] [server] [client_error] agent_id=%v subsystem=%s type=%s url=\"%s\" message=\"%s\" timestamp=%s", agentID, req.Subsystem, req.ErrorType, req.URL, req.Message, time.Now().Format(time.RFC3339)) // Attempt to store in database with retry logic const maxRetries = 3 var lastErr error for attempt := 1; attempt <= maxRetries; attempt++ { query := `INSERT INTO client_errors (agent_id, subsystem, error_type, message, stack_trace, metadata, url) VALUES (:agent_id, :subsystem, :error_type, :message, :stack_trace, :metadata, :url)` _, err := h.db.NamedExec(query, map[string]interface{}{ "agent_id": agentID, "subsystem": req.Subsystem, "error_type": req.ErrorType, "message": req.Message, "stack_trace": req.StackTrace, "metadata": req.Metadata, "url": req.URL, }) if err == nil { c.JSON(http.StatusOK, gin.H{"logged": true}) return } lastErr = err if attempt < maxRetries { time.Sleep(time.Duration(attempt) * time.Second) continue } } log.Printf("[ERROR] [server] [client_error] persistent_failure error=\"%v\"", lastErr) c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to persist error after retries"}) } ``` **Rationale**: - Validates input before processing - Logs with [HISTORY] prefix for unified log aggregation - Implements retry logic per ETHOS #3 (Assume Failure) - Returns appropriate HTTP status codes - Handles database connection failures gracefully --- ##### Step 2.3: Frontend Error Logger ```typescript // File: aggregator-web/src/lib/client-error-logger.ts import { api, ApiError } from './api'; export interface ClientErrorLog { subsystem: string; error_type: 'javascript_error' | 'api_error' | 'ui_error' | 'validation_error'; message: string; stack_trace?: string; metadata?: Record; url: string; } /** * ClientErrorLogger provides reliable frontend error logging to backend * Implements retry logic per ETHOS #3 (Assume Failure) */ export class ClientErrorLogger { private maxRetries = 3; private baseDelayMs = 1000; private localStorageKey = 'redflag-failed-error-logs'; /** * Log an error to the backend with automatic retry */ async logError(errorData: Omit): Promise { const fullError: ClientErrorLog = { ...errorData, url: window.location.href, }; for (let attempt = 1; attempt <= this.maxRetries; attempt++) { try { await api.post('/logs/client-error', fullError, { // Add header to prevent infinite loop if error logger fails headers: { 'X-Error-Logger-Request': 'true' }, }); return; // Success } catch (error) { if (attempt === this.maxRetries) { // Save to localStorage for later retry this.saveFailedLog({ ...fullError, attempt }); } else { // Exponential backoff await this.sleep(this.baseDelayMs * attempt); } } } } /** * Attempt to resend failed error logs from localStorage */ async retryFailedLogs(): Promise { const failedLogs = this.getFailedLogs(); if (failedLogs.length === 0) return; const remaining: any[] = []; for (const log of failedLogs) { try { await this.logError(log); } catch { remaining.push(log); } } if (remaining.length < failedLogs.length) { // Some succeeded, update localStorage localStorage.setItem(this.localStorageKey, JSON.stringify(remaining)); } } private sleep(ms: number): Promise { return new Promise(resolve => setTimeout(resolve, ms)); } private saveFailedLog(log: any): void { try { const existing = this.getFailedLogs(); existing.push(log); localStorage.setItem(this.localStorageKey, JSON.stringify(existing)); } catch { // localStorage might be full or unavailable } } private getFailedLogs(): any[] { try { const stored = localStorage.getItem(this.localStorageKey); return stored ? JSON.parse(stored) : []; } catch { return []; } } } // Singleton instance export const clientErrorLogger = new ClientErrorLogger(); // Auto-retry failed logs on app load if (typeof window !== 'undefined') { window.addEventListener('load', () => { clientErrorLogger.retryFailedLogs().catch(() => {}); }); } ``` **Rationale**: - Implements ETHOS #3 (Assume Failure) with exponential backoff - Saves failed logs to localStorage for retry when network recovers - Auto-retry on app load captures errors from previous sessions - No infinite loops (X-Error-Logger-Request header) --- ##### Step 2.4: Toast Integration ```typescript // File: aggregator-web/src/lib/toast-with-logging.ts import toast, { ToastOptions } from 'react-hot-toast'; import { clientErrorLogger } from './client-error-logger'; // Store reference to original methods const toastError = toast.error; const toastSuccess = toast.success; /** * Wraps toast.error to automatically log errors to backend * Implements ETHOS #1 (Errors are History) */ export const toastWithLogging = { error: (message: string, subsystem: string, options?: ToastOptions) => { // Log to backend asynchronously - don't block UI clientErrorLogger.logError({ subsystem, error_type: 'ui_error', message: message.substring(0, 1000), // Prevent excessively long messages metadata: { timestamp: new Date().toISOString(), user_agent: navigator.userAgent, }, }).catch(() => { // Silently ignore logging failures - don't crash the UI }); // Show toast to user return toastError(message, options); }, success: toastSuccess, info: toast.info, warning: toast.warning, loading: toast.loading, dismiss: toast.dismiss, }; ``` **Rationale**: Transparent wrapper that maintains toast API while adding error logging. User experience unchanged but errors now persist to history table. --- ## Implementation Evaluation: Retry Logic Necessity **Question**: Does every client error log need exponential backoff retry? **Analysis**: ### Errors That SHOULD Have Retry: 1. **API Errors**: Network failures, server 502s, connection timeouts - High value: These indicate real problems - Retry needed: Network glitches common 2. **Critical UI Failures**: Command creation failures, permission errors - High value: Affect user workflow - Retry needed: Server might be temporarily overloaded ### Errors That Could Skip Retry: 1. **Validation Errors**: User entered invalid data - Low value: Expected behavior, not a system issue - No retry: Will immediately fail again 2. **Browser Compatibility Issues**: Old browser, missing features - Low value: Persistent problem until user upgrades - No retry: Won't fix itself ### Recommendation: **Use Retry for API and Critical Errors Only** ```typescript // Simplified version for validation errors (no retry) export const logValidationError = async (subsystem: string, message: string) => { try { await api.post('/logs/client-error', { subsystem, error_type: 'validation_error', message, }); } catch { // Best effort only - validation errors aren't critical } }; // Full retry version for API errors export const logApiError = async (subsystem: string, message: string) => { clientErrorLogger.logError({ subsystem, error_type: 'api_error', message, }); }; ``` **Decision**: Keep retry logic in the general logger (most errors are API/critical), create specific no-retry helpers for validation cases. --- ## Testing Strategy ### Test Command ID Generation ```go func TestCommandFactory_Create(t *testing.T) { factory := command.NewFactory() agentID := uuid.New() cmd, err := factory.Create(agentID, "scan_storage", nil) require.NoError(t, err) assert.NotEqual(t, uuid.Nil, cmd.ID, "ID should be generated") assert.Equal(t, agentID, cmd.AgentID) assert.Equal(t, "scan_storage", cmd.CommandType) } func TestCommandFactory_CreateValidatesInput(t *testing.T) { factory := command.NewFactory() _, err := factory.Create(uuid.Nil, "", nil) assert.Error(t, err) assert.Contains(t, err.Error(), "validation failed") } ``` ### Test Error Logger Retry ```typescript test('logError retries on failure then saves to localStorage', async () => { // Mock API to fail 3 times then succeed const mockPost = jest.fn() .mockRejectedValueOnce(new Error('Network error')) .mockRejectedValueOnce(new Error('Network error')) .mockResolvedValueOnce({}); api.post = mockPost; await clientErrorLogger.logError({ subsystem: 'storage', error_type: 'api_error', message: 'Failed to scan', }); expect(mockPost).toHaveBeenCalledTimes(3); expect(localStorage.getItem).not.toHaveBeenCalled(); // Should succeed after retries }); ``` ### Integration Test ```typescript test('rapid scan button clicks work correctly', async () => { // Click multiple scan buttons await Promise.all([ triggerStorageScan(), triggerSystemScan(), triggerDockerScan(), ]); // All should succeed with unique command IDs const commands = await getAgentCommands(agent.id); const uniqueIDs = new Set(commands.map(c => c.id)); assert.equal(uniqueIDs.size, 3); }); ``` --- ## Implementation Plan ### Step 1: Command Factory (15 minutes) 1. Create `aggregator-server/internal/command/factory.go` 2. Add `Validate()` method to `models.AgentCommand` 3. Update `TriggerSubsystem` and other command creation points to use factory 4. Test: Verify rapid button clicks work ### Step 2: Database Migration (5 minutes) 1. Create `023_client_error_logging.up.sql` 2. Test migration runs successfully 3. Verify table and indexes created ### Step 3: Backend Handler (20 minutes) 1. Create `aggregator-server/internal/api/handlers/client_errors.go` 2. Add route registration in router setup 3. Test API endpoint with curl ### Step 4: Frontend Logger (15 minutes) 1. Create `aggregator-web/src/lib/client-error-logger.ts` 2. Add toast wrapper in `aggregator-web/src/lib/toast-with-logging.ts` 3. Update 2-3 critical error locations to use new logger 4. Test: Verify errors appear in database ### Step 5: Verification (10 minutes) 1. Test full workflow: trigger scan, verify command ID unique 2. Test error scenario: disconnect network, verify retry works 3. Check database: confirm errors stored with context **Total Time**: ~1 hour 5 minutes --- ## Files to Create 1. `aggregator-server/internal/command/factory.go` 2. `aggregator-server/internal/database/migrations/023_client_error_logging.up.sql` 3. `aggregator-server/internal/api/handlers/client_errors.go` 4. `aggregator-web/src/lib/client-error-logger.ts` 5. `aggregator-web/src/lib/toast-with-logging.ts` ## Files to Modify 1. `aggregator-server/internal/models/command.go` - Add Validate() method 2. `aggregator-server/internal/api/handlers/subsystems.go` - Use command factory 3. `aggregator-server/internal/api/router.go` - Register error logging route 4. 2-3 frontend files with critical error paths --- ## ETHOS Compliance Verification - [ ] **ETHOS #1**: All errors logged with context to history table ✓ - [ ] **ETHOS #2**: Endpoint protected by auth middleware ✓ - [ ] **ETHOS #3**: Retry logic with exponential backoff implemented ✓ - [ ] **ETHOS #4**: Database constraints handle duplicate logging gracefully ✓ - [ ] **ETHOS #5**: No marketing fluff; technical, honest naming used ✓ --- **Status**: Ready for Implementation **Recommendation**: Implement all steps in order for clean, maintainable solution