Files
Redflag/docs/historical/CLEAN_ARCHITECTURE_DESIGN.md

19 KiB

Clean Architecture: Command ID & Frontend Error Logging

Date: 2025-12-19 Status: CLEAN ARCHITECTURE DESIGN (ETHOS Compliant)


Problem Statement

RedFlag has two critical issues violating ETHOS principles:

  1. Command ID Generation Failure: Server fails to generate unique IDs for commands, causing pq: duplicate key value violates unique constraint "agent_commands_pkey" when users trigger multiple scans rapidly

  2. Frontend Errors Lost: UI failures show toasts but are never persisted, violating ETHOS #1: "Errors are History, Not /dev/null"


ETHOS Compliance Requirements

ETHOS #1: All errors must be captured, logged with context, stored in history table - NEVER to /dev/null

ETHOS #2: No unauthenticated endpoints - all routes protected by established security stack

ETHOS #3: Assume failure - implement retry logic with exponential backoff for network operations

ETHOS #4: Idempotency - system must handle duplicate operations gracefully

ETHOS #5: No marketing fluff - clear, honest naming using technical terms


Clean Architecture Design

Phase 1: Command ID Generation (Server-Side)

Problem

Commands are created without IDs, causing PostgreSQL to receive zero UUIDs (00000000-0000-0000-0000-000000000000), resulting in primary key violations on subsequent inserts.

Solution: Command Factory Pattern

// File: aggregator-server/internal/command/factory.go

package command

import (
    "errors"
    "fmt"
    
    "github.com/Fimeg/RedFlag/aggregator-server/internal/models"
    "github.com/google/uuid"
)

// Factory creates validated AgentCommand instances
type Factory struct{}

// NewFactory creates a new command factory
func NewFactory() *Factory {
    return &Factory{}
}

// Create generates a new validated AgentCommand
func (f *Factory) Create(agentID uuid.UUID, commandType string, params map[string]interface{}) (*models.AgentCommand, error) {
    cmd := &models.AgentCommand{
        ID:          uuid.New(), // Generation happens immediately and explicitly
        AgentID:     agentID,
        CommandType: commandType,
        Status:      "pending",
        Source:      "manual",
        Params:      params,
    }
    
    if err := cmd.Validate(); err != nil {
        return nil, fmt.Errorf("command validation failed: %w", err)
    }
    
    return cmd, nil
}

Add validation to AgentCommand model:

// File: aggregator-server/internal/models/command.go

// Validate checks if the command is valid
func (c *AgentCommand) Validate() error {
    if c.ID == uuid.Nil {
        return errors.New("command ID cannot be zero UUID")
    }
    if c.AgentID == uuid.Nil {
        return errors.New("agent ID required")
    }
    if c.CommandType == "" {
        return errors.New("command type required")
    }
    if c.Status == "" {
        return errors.New("status required")
    }
    if c.Source != "manual" && c.Source != "system" {
        return errors.New("source must be 'manual' or 'system'")
    }
    
    return nil
}

Rationale: Factory pattern ensures IDs are always generated at creation time, making it impossible to create invalid commands. Fail-fast validation catches issues immediately.

Impact: Fixes the immediate duplicate key error and prevents similar bugs in all future command creation.


Phase 2: Frontend Error Logging (UI to Server)

Problem

Frontend shows errors via toast notifications but never persists them. When users report "the button didn't work," we have no record of what failed, when, or why.

ETHOS #1 Violation: Errors that exist only in browser memory are equivalent to /dev/null

Solution: Client Error Logging System

Step 2.1: Database Schema
-- File: aggregator-server/internal/database/migrations/023_client_error_logging.up.sql
-- Purpose: Store frontend errors for debugging and auditing

CREATE TABLE client_errors (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    agent_id UUID REFERENCES agents(id) ON DELETE SET NULL,
    subsystem VARCHAR(50) NOT NULL,
    error_type VARCHAR(50) NOT NULL, -- 'javascript_error', 'api_error', 'ui_error', 'validation_error'
    message TEXT NOT NULL,
    stack_trace TEXT,
    metadata JSONB,
    url TEXT NOT NULL,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Indexes for common query patterns
CREATE INDEX idx_client_errors_agent_time ON client_errors(agent_id, created_at DESC);
CREATE INDEX idx_client_errors_subsystem_time ON client_errors(subsystem, created_at DESC);
CREATE INDEX idx_client_errors_type_time ON client_errors(error_type, created_at DESC);

-- Comments for documentation
COMMENT ON TABLE client_errors IS 'Frontend error logs for debugging and auditing';
COMMENT ON COLUMN client_errors.agent_id IS 'Agent that was active when error occurred (NULL for pre-auth errors)';
COMMENT ON COLUMN client_errors.subsystem IS 'Which RedFlag subsystem was being used';
COMMENT ON COLUMN client_errors.error_type IS 'Category of error for filtering';
COMMENT ON COLUMN client_errors.metadata IS 'Additional context (component name, API response, user actions)';

Rationale: Proper schema with indexes allows efficient querying. References agents table to correlate errors with specific agents. Stores rich context for debugging.


Step 2.2: Backend Handler
// File: aggregator-server/internal/api/handlers/client_errors.go

package handlers

import (
    "database/sql"
    "fmt"
    "log"
    "net/http"
    "time"
    
    "github.com/gin-gonic/gin"
    "github.com/jmoiron/sqlx"
)

// ClientErrorHandler handles frontend error logging
type ClientErrorHandler struct {
    db           *sqlx.DB
}

// NewClientErrorHandler creates a new error handler
func NewClientErrorHandler(db *sqlx.DB) *ClientErrorHandler {
    return &ClientErrorHandler{db: db}
}

// LogError processes and stores frontend errors
func (h *ClientErrorHandler) LogError(c *gin.Context) {
    // Extract agent ID from auth middleware if available
    var agentID interface{}
    if agentIDValue, exists := c.Get("agentID"); exists {
        agentID = agentIDValue
    }
    
    var req struct {
        Subsystem   string                 `json:"subsystem" binding:"required"`
        ErrorType   string                 `json:"error_type" binding:"required,oneof=javascript_error api_error ui_error validation_error"`
        Message     string                 `json:"message" binding:"required"`
        StackTrace  string                 `json:"stack_trace,omitempty"`
        Metadata    map[string]interface{} `json:"metadata,omitempty"`
        URL         string                 `json:"url" binding:"required"`
    }
    
    if err := c.ShouldBindJSON(&req); err != nil {
        log.Printf("[ERROR] [server] [client_error] validation_failed error=\"%v\"", err)
        c.JSON(http.StatusBadRequest, gin.H{"error": "invalid request data"})
        return
    }
    
    // Log to console with HISTORY prefix for unified logging
    log.Printf("[ERROR] [server] [client] [%s] agent_id=%v subsystem=%s message=\"%s\"", 
        req.ErrorType, agentID, req.Subsystem, req.Message)
    log.Printf("[HISTORY] [server] [client_error] agent_id=%v subsystem=%s type=%s url=\"%s\" message=\"%s\" timestamp=%s",
        agentID, req.Subsystem, req.ErrorType, req.URL, req.Message, time.Now().Format(time.RFC3339))
    
    // Attempt to store in database with retry logic
    const maxRetries = 3
    var lastErr error
    
    for attempt := 1; attempt <= maxRetries; attempt++ {
        query := `INSERT INTO client_errors (agent_id, subsystem, error_type, message, stack_trace, metadata, url) 
                  VALUES (:agent_id, :subsystem, :error_type, :message, :stack_trace, :metadata, :url)`
        
        _, err := h.db.NamedExec(query, map[string]interface{}{
            "agent_id":   agentID,
            "subsystem":  req.Subsystem,
            "error_type": req.ErrorType,
            "message":    req.Message,
            "stack_trace": req.StackTrace,
            "metadata":   req.Metadata,
            "url":        req.URL,
        })
        
        if err == nil {
            c.JSON(http.StatusOK, gin.H{"logged": true})
            return
        }
        
        lastErr = err
        if attempt < maxRetries {
            time.Sleep(time.Duration(attempt) * time.Second)
            continue
        }
    }
    
    log.Printf("[ERROR] [server] [client_error] persistent_failure error=\"%v\"", lastErr)
    c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to persist error after retries"})
}

Rationale:

  • Validates input before processing
  • Logs with [HISTORY] prefix for unified log aggregation
  • Implements retry logic per ETHOS #3 (Assume Failure)
  • Returns appropriate HTTP status codes
  • Handles database connection failures gracefully

Step 2.3: Frontend Error Logger
// File: aggregator-web/src/lib/client-error-logger.ts

import { api, ApiError } from './api';

export interface ClientErrorLog {
  subsystem: string;
  error_type: 'javascript_error' | 'api_error' | 'ui_error' | 'validation_error';
  message: string;
  stack_trace?: string;
  metadata?: Record<string, any>;
  url: string;
}

/**
 * ClientErrorLogger provides reliable frontend error logging to backend
 * Implements retry logic per ETHOS #3 (Assume Failure)
 */
export class ClientErrorLogger {
  private maxRetries = 3;
  private baseDelayMs = 1000;
  private localStorageKey = 'redflag-failed-error-logs';

  /**
   * Log an error to the backend with automatic retry
   */
  async logError(errorData: Omit<ClientErrorLog, 'url'>): Promise<void> {
    const fullError: ClientErrorLog = {
      ...errorData,
      url: window.location.href,
    };

    for (let attempt = 1; attempt <= this.maxRetries; attempt++) {
      try {
        await api.post('/logs/client-error', fullError, {
          // Add header to prevent infinite loop if error logger fails
          headers: { 'X-Error-Logger-Request': 'true' },
        });
        return; // Success
      } catch (error) {
        if (attempt === this.maxRetries) {
          // Save to localStorage for later retry
          this.saveFailedLog({ ...fullError, attempt });
        } else {
          // Exponential backoff
          await this.sleep(this.baseDelayMs * attempt);
        }
      }
    }
  }

  /**
   * Attempt to resend failed error logs from localStorage
   */
  async retryFailedLogs(): Promise<void> {
    const failedLogs = this.getFailedLogs();
    if (failedLogs.length === 0) return;

    const remaining: any[] = [];

    for (const log of failedLogs) {
      try {
        await this.logError(log);
      } catch {
        remaining.push(log);
      }
    }

    if (remaining.length < failedLogs.length) {
      // Some succeeded, update localStorage
      localStorage.setItem(this.localStorageKey, JSON.stringify(remaining));
    }
  }

  private sleep(ms: number): Promise<void> {
    return new Promise(resolve => setTimeout(resolve, ms));
  }

  private saveFailedLog(log: any): void {
    try {
      const existing = this.getFailedLogs();
      existing.push(log);
      localStorage.setItem(this.localStorageKey, JSON.stringify(existing));
    } catch {
      // localStorage might be full or unavailable
    }
  }

  private getFailedLogs(): any[] {
    try {
      const stored = localStorage.getItem(this.localStorageKey);
      return stored ? JSON.parse(stored) : [];
    } catch {
      return [];
    }
  }
}

// Singleton instance
export const clientErrorLogger = new ClientErrorLogger();

// Auto-retry failed logs on app load
if (typeof window !== 'undefined') {
  window.addEventListener('load', () => {
    clientErrorLogger.retryFailedLogs().catch(() => {});
  });
}

Rationale:

  • Implements ETHOS #3 (Assume Failure) with exponential backoff
  • Saves failed logs to localStorage for retry when network recovers
  • Auto-retry on app load captures errors from previous sessions
  • No infinite loops (X-Error-Logger-Request header)

Step 2.4: Toast Integration
// File: aggregator-web/src/lib/toast-with-logging.ts

import toast, { ToastOptions } from 'react-hot-toast';
import { clientErrorLogger } from './client-error-logger';

// Store reference to original methods
const toastError = toast.error;
const toastSuccess = toast.success;

/**
 * Wraps toast.error to automatically log errors to backend
 * Implements ETHOS #1 (Errors are History)
 */
export const toastWithLogging = {
  error: (message: string, subsystem: string, options?: ToastOptions) => {
    // Log to backend asynchronously - don't block UI
    clientErrorLogger.logError({
      subsystem,
      error_type: 'ui_error',
      message: message.substring(0, 1000), // Prevent excessively long messages
      metadata: {
        timestamp: new Date().toISOString(),
        user_agent: navigator.userAgent,
      },
    }).catch(() => {
      // Silently ignore logging failures - don't crash the UI
    });

    // Show toast to user
    return toastError(message, options);
  },

  success: toastSuccess,
  info: toast.info,
  warning: toast.warning,
  loading: toast.loading,
  dismiss: toast.dismiss,
};

Rationale: Transparent wrapper that maintains toast API while adding error logging. User experience unchanged but errors now persist to history table.


Implementation Evaluation: Retry Logic Necessity

Question: Does every client error log need exponential backoff retry?

Analysis:

Errors That SHOULD Have Retry:

  1. API Errors: Network failures, server 502s, connection timeouts

    • High value: These indicate real problems
    • Retry needed: Network glitches common
  2. Critical UI Failures: Command creation failures, permission errors

    • High value: Affect user workflow
    • Retry needed: Server might be temporarily overloaded

Errors That Could Skip Retry:

  1. Validation Errors: User entered invalid data

    • Low value: Expected behavior, not a system issue
    • No retry: Will immediately fail again
  2. Browser Compatibility Issues: Old browser, missing features

    • Low value: Persistent problem until user upgrades
    • No retry: Won't fix itself

Recommendation: Use Retry for API and Critical Errors Only

// Simplified version for validation errors (no retry)
export const logValidationError = async (subsystem: string, message: string) => {
  try {
    await api.post('/logs/client-error', { 
      subsystem, 
      error_type: 'validation_error',
      message,
    });
  } catch {
    // Best effort only - validation errors aren't critical
  }
};

// Full retry version for API errors
export const logApiError = async (subsystem: string, message: string) => {
  clientErrorLogger.logError({
    subsystem,
    error_type: 'api_error',
    message,
  });
};

Decision: Keep retry logic in the general logger (most errors are API/critical), create specific no-retry helpers for validation cases.


Testing Strategy

Test Command ID Generation

func TestCommandFactory_Create(t *testing.T) {
    factory := command.NewFactory()
    agentID := uuid.New()
    
    cmd, err := factory.Create(agentID, "scan_storage", nil)
    
    require.NoError(t, err)
    assert.NotEqual(t, uuid.Nil, cmd.ID, "ID should be generated")
    assert.Equal(t, agentID, cmd.AgentID)
    assert.Equal(t, "scan_storage", cmd.CommandType)
}

func TestCommandFactory_CreateValidatesInput(t *testing.T) {
    factory := command.NewFactory()
    
    _, err := factory.Create(uuid.Nil, "", nil)
    
    assert.Error(t, err)
    assert.Contains(t, err.Error(), "validation failed")
}

Test Error Logger Retry

test('logError retries on failure then saves to localStorage', async () => {
  // Mock API to fail 3 times then succeed
  const mockPost = jest.fn()
    .mockRejectedValueOnce(new Error('Network error'))
    .mockRejectedValueOnce(new Error('Network error'))
    .mockResolvedValueOnce({});
  
  api.post = mockPost;
  
  await clientErrorLogger.logError({
    subsystem: 'storage',
    error_type: 'api_error',
    message: 'Failed to scan',
  });
  
  expect(mockPost).toHaveBeenCalledTimes(3);
  expect(localStorage.getItem).not.toHaveBeenCalled(); // Should succeed after retries
});

Integration Test

test('rapid scan button clicks work correctly', async () => {
  // Click multiple scan buttons
  await Promise.all([
    triggerStorageScan(),
    triggerSystemScan(),
    triggerDockerScan(),
  ]);
  
  // All should succeed with unique command IDs
  const commands = await getAgentCommands(agent.id);
  const uniqueIDs = new Set(commands.map(c => c.id));
  assert.equal(uniqueIDs.size, 3);
});

Implementation Plan

Step 1: Command Factory (15 minutes)

  1. Create aggregator-server/internal/command/factory.go
  2. Add Validate() method to models.AgentCommand
  3. Update TriggerSubsystem and other command creation points to use factory
  4. Test: Verify rapid button clicks work

Step 2: Database Migration (5 minutes)

  1. Create 023_client_error_logging.up.sql
  2. Test migration runs successfully
  3. Verify table and indexes created

Step 3: Backend Handler (20 minutes)

  1. Create aggregator-server/internal/api/handlers/client_errors.go
  2. Add route registration in router setup
  3. Test API endpoint with curl

Step 4: Frontend Logger (15 minutes)

  1. Create aggregator-web/src/lib/client-error-logger.ts
  2. Add toast wrapper in aggregator-web/src/lib/toast-with-logging.ts
  3. Update 2-3 critical error locations to use new logger
  4. Test: Verify errors appear in database

Step 5: Verification (10 minutes)

  1. Test full workflow: trigger scan, verify command ID unique
  2. Test error scenario: disconnect network, verify retry works
  3. Check database: confirm errors stored with context

Total Time: ~1 hour 5 minutes


Files to Create

  1. aggregator-server/internal/command/factory.go
  2. aggregator-server/internal/database/migrations/023_client_error_logging.up.sql
  3. aggregator-server/internal/api/handlers/client_errors.go
  4. aggregator-web/src/lib/client-error-logger.ts
  5. aggregator-web/src/lib/toast-with-logging.ts

Files to Modify

  1. aggregator-server/internal/models/command.go - Add Validate() method
  2. aggregator-server/internal/api/handlers/subsystems.go - Use command factory
  3. aggregator-server/internal/api/router.go - Register error logging route
  4. 2-3 frontend files with critical error paths

ETHOS Compliance Verification

  • ETHOS #1: All errors logged with context to history table ✓
  • ETHOS #2: Endpoint protected by auth middleware ✓
  • ETHOS #3: Retry logic with exponential backoff implemented ✓
  • ETHOS #4: Database constraints handle duplicate logging gracefully ✓
  • ETHOS #5: No marketing fluff; technical, honest naming used ✓

Status: Ready for Implementation Recommendation: Implement all steps in order for clean, maintainable solution