Files

Fimeg 484a7f77ce Add docs and project files - force for Culurien

2026-03-28 20:46:24 -04:00

53 KiB

Raw Blame History

RedFlag Clean Architecture Implementation Master Plan

Date: 2025-12-19 Version: v1.0 Total Implementation Time: 3-4 hours (including migration fixes and command deduplication) Status: READY FOR EXECUTION

Executive Summary

Complete implementation plan for fixing critical ETHOS violations and implementing clean architecture patterns across RedFlag v0.1.27. Addresses duplicate command generation, lost frontend errors, and migration system bugs.

Three Core Objectives:

✅ Fix migration system (blocks everything else)
✅ Implement command factory pattern (prevents duplicate key violations)
✅ Build frontend error logging system (ETHOS #1 compliance)

Pre-Implementation: Migration System Fix
Phase 1: Command Factory Pattern
Phase 2: Database Schema
Phase 3: Backend Error Handler
Phase 4: Frontend Error Logger
Phase 5: Toast Integration
Phase 6: Verification & Testing
Implementation Checklist
Risk Mitigation
Post-Implementation Review

Pre-Implementation: Migration System Fix

⚠️ CRITICAL: Must be completed first - blocks all other work

Problem

Migration runner has duplicate INSERT logic causing "duplicate key value violates unique constraint" errors on fresh installations.

Root Cause

File: aggregator-server/internal/database/db.go

Line 103: Executes INSERT INTO schema_migrations (version) VALUES ($1)
Line 116: Executes the exact same INSERT statement
Result: Every migration filename gets inserted twice

Solution

// File: aggregator-server/internal/database/db.go

// Lines 95-120: Fix duplicate INSERT logic
func (db *DB) Migrate() error {
    // ... existing code ...

    for _, file := range files {
        filename := file.Name()
        // ❌ REMOVE THIS - Line 103 duplicates line 116
        // if _, err = tx.Exec("INSERT INTO schema_migrations (version) VALUES ($1)", filename); err != nil {
        //     return fmt.Errorf("failed to mark migration %s as applied: %w", filename, err)
        // }

        // Keep only the EXECUTE + INSERT combo at lines 110-116
        if _, err = tx.Exec(string(content)); err != nil {
            log.Printf("Migration %s failed, marking as applied: %v", filename, err)
        }

        // ✅ Keep this INSERT - it's the correct location
        if _, err = tx.Exec("INSERT INTO schema_migrations (version) VALUES ($1)", filename); err != nil {
            return fmt.Errorf("failed to mark migration %s as applied: %w", filename, err)
        }
    }

    // ... rest of function ...
}

Validation Steps

Wipe database completely: docker-compose down -v
Start fresh: docker-compose up -d
Check migration logs: Should see all migrations apply without duplicate key errors
Verify: SELECT COUNT(DISTINCT version) = COUNT(version) FROM schema_migrations

Time Required: 5 minutes

Blocker Status: 🔴 CRITICAL - Do not proceed until fixed

Phase 1: Command Factory Pattern

Objective

Prevent duplicate command key violations by ensuring all commands have properly generated UUIDs at creation time.

Files to Create

1.1 Command Factory

File: aggregator-server/internal/command/factory.go

package command

import (
    "errors"
    "fmt"
    "time"

    "github.com/google/uuid"
    "github.com/Fimeg/RedFlag/aggregator-server/internal/models"
)

// Factory creates validated AgentCommand instances
type Factory struct {
    validator *Validator
}

// NewFactory creates a new command factory
func NewFactory() *Factory {
    return &Factory{
        validator: NewValidator(),
    }
}

// Create generates a new validated AgentCommand with unique ID
func (f *Factory) Create(agentID uuid.UUID, commandType string, params map[string]interface{}) (*models.AgentCommand, error) {
    cmd := &models.AgentCommand{
        ID:          uuid.New(), // Immediate, explicit generation
        AgentID:     agentID,
        CommandType: commandType,
        Status:      "pending",
        Source:      determineSource(commandType),
        Params:      params,
        CreatedAt:   time.Now(),
        UpdatedAt:   time.Now(),
    }

    if err := f.validator.Validate(cmd); err != nil {
        return nil, fmt.Errorf("command validation failed: %w", err)
    }

    return cmd, nil
}

// CreateWithIdempotency generates a command with idempotency protection
func (f *Factory) CreateWithIdempotency(agentID uuid.UUID, commandType string,
    params map[string]interface{}, idempotencyKey string) (*models.AgentCommand, error) {

    // Check for existing command with same idempotency key
    // (Implementation depends on database query layer)
    existing, err := f.findByIdempotencyKey(agentID, idempotencyKey)
    if err != nil && err != sql.ErrNoRows {
        return nil, fmt.Errorf("failed to check idempotency: %w", err)
    }

    if existing != nil {
        return existing, nil // Return existing command instead of creating duplicate
    }

    cmd, err := f.Create(agentID, commandType, params)
    if err != nil {
        return nil, err
    }

    // Store idempotency key with command
    if err := f.storeIdempotencyKey(cmd.ID, agentID, idempotencyKey); err != nil {
        return nil, fmt.Errorf("failed to store idempotency key: %w", err)
    }

    return cmd, nil
}

// determineSource classifies command source based on type
func determineSource(commandType string) string {
    if isSystemCommand(commandType) {
        return "system"
    }
    return "manual"
}

func isSystemCommand(commandType string) bool {
    systemCommands := []string{
        "enable_heartbeat",
        "disable_heartbeat",
        "update_check",
        "cleanup_old_logs",
    }

    for _, cmd := range systemCommands {
        if commandType == cmd {
            return true
        }
    }
    return false
}

1.2 Command Validator

File: aggregator-server/internal/command/validator.go

package command

import (
    "errors"
    "fmt"

    "github.com/google/uuid"
    "github.com/Fimeg/RedFlag/aggregator-server/internal/models"
)

// Validator validates command parameters
type Validator struct {
    minCheckInSeconds int
    maxCheckInSeconds int
    minScannerMinutes int
    maxScannerMinutes int
}

// NewValidator creates a new command validator
func NewValidator() *Validator {
    return &Validator{
        minCheckInSeconds: 60,   // 1 minute minimum
        maxCheckInSeconds: 3600, // 1 hour maximum
        minScannerMinutes: 1,    // 1 minute minimum
        maxScannerMinutes: 1440, // 24 hours maximum
    }
}

// Validate performs comprehensive command validation
func (v *Validator) Validate(cmd *models.AgentCommand) error {
    if cmd == nil {
        return errors.New("command cannot be nil")
    }

    if cmd.ID == uuid.Nil {
        return errors.New("command ID cannot be zero UUID")
    }

    if cmd.AgentID == uuid.Nil {
        return errors.New("agent ID is required")
    }

    if cmd.CommandType == "" {
        return errors.New("command type is required")
    }

    if cmd.Status == "" {
        return errors.New("status is required")
    }

    validStatuses := []string{"pending", "running", "completed", "failed", "cancelled"}
    if !contains(validStatuses, cmd.Status) {
        return fmt.Errorf("invalid status: %s", cmd.Status)
    }

    if cmd.Source != "manual" && cmd.Source != "system" {
        return fmt.Errorf("source must be 'manual' or 'system', got: %s", cmd.Source)
    }

    // Validate command type format
    if err := v.validateCommandType(cmd.CommandType); err != nil {
        return err
    }

    return nil
}

// ValidateSubsystemAction validates subsystem-specific actions
func (v *Validator) ValidateSubsystemAction(subsystem string, action string) error {
    validActions := map[string][]string{
        "storage": {"trigger", "enable", "disable", "set_interval"},
        "system":  {"trigger", "enable", "disable", "set_interval"},
        "docker":  {"trigger", "enable", "disable", "set_interval"},
        "updates": {"trigger", "enable", "disable", "set_interval"},
    }

    actions, ok := validActions[subsystem]
    if !ok {
        return fmt.Errorf("unknown subsystem: %s", subsystem)
    }

    if !contains(actions, action) {
        return fmt.Errorf("invalid action '%s' for subsystem '%s'", action, subsystem)
    }

    return nil
}

// ValidateInterval ensures scanner intervals are within bounds
func (v *Validator) ValidateInterval(subsystem string, minutes int) error {
    if minutes < v.minScannerMinutes {
        return fmt.Errorf("interval %d minutes below minimum %d for subsystem %s",
            minutes, v.minScannerMinutes, subsystem)
    }

    if minutes > v.maxScannerMinutes {
        return fmt.Errorf("interval %d minutes above maximum %d for subsystem %s",
            minutes, v.maxScannerMinutes, subsystem)
    }

    return nil
}

func (v *Validator) validateCommandType(commandType string) error {
    validPrefixes := []string{"scan_", "install_", "update_", "enable_", "disable_", "reboot"}

    for _, prefix := range validPrefixes {
        if len(commandType) >= len(prefix) && commandType[:len(prefix)] == prefix {
            return nil
        }
    }

    return fmt.Errorf("invalid command type format: %s", commandType)
}

func contains(slice []string, item string) bool {
    for _, s := range slice {
        if s == item {
            return true
        }
    }
    return false
}

1.3 Update AgentCommand Model

File: aggregator-server/internal/models/command.go

package models

import (
    "database/sql"
    "time"

    "github.com/google/uuid"
    "github.com/lib/pq"
)

// AgentCommand represents a command sent to an agent
type AgentCommand struct {
    ID            uuid.UUID       `db:"id" json:"id"`
    AgentID       uuid.UUID       `db:"agent_id" json:"agent_id"`
    CommandType   string          `db:"command_type" json:"command_type"`
    Status        string          `db:"status" json:"status"`
    Source        string          `db:"source" json:"source"`
    Params        pq.ByteaArray   `db:"params" json:"params"`
    Result        sql.NullString  `db:"result" json:"result,omitempty"`
    Error         sql.NullString  `db:"error" json:"error,omitempty"`
    RetryCount    int             `db:"retry_count" json:"retry_count"`
    CreatedAt     time.Time       `db:"created_at" json:"created_at"`
    UpdatedAt     time.Time       `db:"updated_at" json:"updated_at"`
    CompletedAt   pq.NullTime     `db:"completed_at" json:"completed_at,omitempty"`

    // Idempotency support
    IdempotencyKey uuid.NullUUID  `db:"idempotency_key" json:"-"`
}

// Validate checks if the command is valid
func (c *AgentCommand) Validate() error {
    if c.ID == uuid.Nil {
        return ErrCommandIDRequired
    }
    if c.AgentID == uuid.Nil {
        return ErrAgentIDRequired
    }
    if c.CommandType == "" {
        return ErrCommandTypeRequired
    }
    if c.Status == "" {
        return ErrStatusRequired
    }
    if c.Source != "manual" && c.Source != "system" {
        return ErrInvalidSource
    }

    return nil
}

// IsTerminal returns true if the command is in a terminal state
func (c *AgentCommand) IsTerminal() bool {
    return c.Status == "completed" || c.Status == "failed" || c.Status == "cancelled"
}

// CanRetry returns true if the command can be retried
func (c *AgentCommand) CanRetry() bool {
    return c.Status == "failed" && c.RetryCount < 3
}

// Predefined errors for validation
var (
    ErrCommandIDRequired = errors.New("command ID cannot be zero UUID")
    ErrAgentIDRequired   = errors.New("agent ID is required")
    ErrCommandTypeRequired = errors.New("command type is required")
    ErrStatusRequired    = errors.New("status is required")
    ErrInvalidSource     = errors.New("source must be 'manual' or 'system'")
)

1.4 Update Subsystem Handler

File: aggregator-server/internal/api/handlers/subsystems.go

package handlers

import (
    "log"
    "net/http"

    "github.com/gin-gonic/gin"
    "github.com/google/uuid"
    "github.com/jmoiron/sqlx"

    "github.com/Fimeg/RedFlag/aggregator-server/internal/command"
    "github.com/Fimeg/RedFlag/aggregator-server/internal/models"
)

type SubsystemHandler struct {
    db              *sqlx.DB
    commandFactory  *command.Factory
}

func NewSubsystemHandler(db *sqlx.DB) *SubsystemHandler {
    return &SubsystemHandler{
        db:             db,
        commandFactory: command.NewFactory(),
    }
}

// TriggerSubsystem creates and enqueues a subsystem command
func (h *SubsystemHandler) TriggerSubsystem(c *gin.Context) {
    agentID, err := uuid.Parse(c.Param("id"))
    if err != nil {
        log.Printf("[ERROR] [server] [subsystem] invalid_agent_id error=%v", err)
        c.JSON(http.StatusBadRequest, gin.H{"error": "invalid agent ID"})
        return
    }

    subsystem := c.Param("subsystem")
    if err := h.validateSubsystem(subsystem); err != nil {
        log.Printf("[ERROR] [server] [subsystem] invalid_subsystem subsystem=%s error=%v", subsystem, err)
        c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
        return
    }

    // DEDUPLICATION CHECK: Prevent multiple pending scans
    existingCmd, err := h.getPendingScanCommand(agentID, subsystem)
    if err != nil {
        log.Printf("[ERROR] [server] [subsystem] query_failed agent_id=%s subsystem=%s error=%v",
            agentID, subsystem, err)
        c.JSON(http.StatusInternalServerError, gin.H{"error": "internal error"})
        return
    }

    if existingCmd != nil {
        log.Printf("[INFO] [server] [subsystem] scan_already_pending agent_id=%s subsystem=%s command_id=%s",
            agentID, subsystem, existingCmd.ID)
        log.Printf("[HISTORY] [server] [scan_%s] duplicate_request_prevented agent_id=%s command_id=%s timestamp=%s",
            subsystem, agentID, existingCmd.ID, time.Now().Format(time.RFC3339))

        c.JSON(http.StatusConflict, gin.H{
            "error": "Scan already in progress",
            "command_id": existingCmd.ID.String(),
            "subsystem": subsystem,
            "status": existingCmd.Status,
            "created_at": existingCmd.CreatedAt,
        })
        return
    }

    // Generate idempotency key from request context
    idempotencyKey := h.generateIdempotencyKey(c, agentID, subsystem)

    // Create command using factory
    cmd, err := h.commandFactory.CreateWithIdempotency(
        agentID,
        "scan_"+subsystem,
        map[string]interface{}{"subsystem": subsystem},
        idempotencyKey,
    )
    if err != nil {
        log.Printf("[ERROR] [server] [subsystem] command_creation_failed agent_id=%s subsystem=%s error=%v",
            agentID, subsystem, err)
        c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to create command"})
        return
    }

    // Store command in database
    if err := h.storeCommand(cmd); err != nil {
        log.Printf("[ERROR] [server] [subsystem] command_store_failed agent_id=%s command_id=%s error=%v",
            agentID, cmd.ID, err)
        c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to store command"})
        return
    }

    log.Printf("[INFO] [server] [subsystem] command_created agent_id=%s command_id=%s subsystem=%s",
        agentID, cmd.ID, subsystem)
    log.Printf("[HISTORY] [server] [scan_%s] command_created agent_id=%s command_id=%s source=manual timestamp=%s",
        subsystem, agentID, cmd.ID, time.Now().Format(time.RFC3339))

    c.JSON(http.StatusOK, gin.H{
        "message": "Command created successfully",
        "command_id": cmd.ID.String(),
        "subsystem": subsystem,
    })
}

// getPendingScanCommand checks for existing pending scan commands
func (h *SubsystemHandler) getPendingScanCommand(agentID uuid.UUID, subsystem string) (*models.AgentCommand, error) {
    var cmd models.AgentCommand
    query := `
        SELECT id, command_type, status, created_at
        FROM agent_commands
        WHERE agent_id = $1
          AND command_type = $2
          AND status = 'pending'
        LIMIT 1`

    commandType := "scan_" + subsystem
    err := h.db.Get(&cmd, query, agentID, commandType)
    if err != nil {
        if err == sql.ErrNoRows {
            return nil, nil // No pending command found
        }
        return nil, fmt.Errorf("query failed: %w", err)
    }

    return &cmd, nil
}

// validateSubsystem checks if subsystem is recognized
func (h *SubsystemHandler) validateSubsystem(subsystem string) error {
    validSubsystems := []string{"apt", "dnf", "windows", "winget", "storage", "system", "docker"}
    for _, valid := range validSubsystems {
        if subsystem == valid {
            return nil
        }
    }
    return fmt.Errorf("unknown subsystem: %s", subsystem)
}

// generateIdempotencyKey creates a key to prevent duplicate submissions
func (h *SubsystemHandler) generateIdempotencyKey(c *gin.Context, agentID uuid.UUID, subsystem string) string {
    // Use timestamp rounded to nearest minute for idempotency window
    // This allows retries within same minute but prevents duplicates across minutes
    timestampWindow := time.Now().Unix() / 60 // Round to minute
    return fmt.Sprintf("%s:%s:%d", agentID.String(), subsystem, timestampWindow)
}

// storeCommand persists command to database
func (h *SubsystemHandler) storeCommand(cmd *models.AgentCommand) error {
    // Implementation depends on your command storage layer
    // Use NamedExec or similar to insert command
    query := `
        INSERT INTO agent_commands
        (id, agent_id, command_type, status, source, params, created_at)
        VALUES (:id, :agent_id, :command_type, :status, :source, :params, NOW())`

    _, err := h.db.NamedExec(query, cmd)
    return err
}

Time Required: 30 minutes

Phase 2: Database Schema

Migration 023a: Command Deduplication

File: aggregator-server/internal/database/migrations/023a_command_deduplication.up.sql

-- Command Deduplication Schema
-- Prevents multiple pending scan commands per subsystem per agent

-- Add unique constraint to enforce single pending command per subsystem
CREATE UNIQUE INDEX idx_agent_pending_subsystem
ON agent_commands(agent_id, command_type, status)
WHERE status = 'pending';

-- Add idempotency key support for retry scenarios
ALTER TABLE agent_commands ADD COLUMN idempotency_key VARCHAR(64) UNIQUE NULL;
CREATE INDEX idx_agent_commands_idempotency_key ON agent_commands(idempotency_key);

COMMENT ON COLUMN agent_commands.idempotency_key IS
  'Prevents duplicate command creation from retry logic. Based on (agent_id + subsystem + timestamp window).';

File: aggregator-server/internal/database/migrations/023a_command_deduplication.down.sql

DROP INDEX IF EXISTS idx_agent_pending_subsystem;
ALTER TABLE agent_commands DROP COLUMN IF EXISTS idempotency_key;

Migration 023: Client Error Logging Table

File: aggregator-server/internal/database/migrations/023_client_error_logging.up.sql

-- Client Error Logging Schema
-- Implements ETHOS #1: Errors are History, Not /dev/null

CREATE TABLE client_errors (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    agent_id UUID REFERENCES agents(id) ON DELETE SET NULL,
    subsystem VARCHAR(50) NOT NULL,
    error_type VARCHAR(50) NOT NULL,
    message TEXT NOT NULL,
    stack_trace TEXT,
    metadata JSONB,
    url TEXT NOT NULL,
    user_agent TEXT,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Indexes for efficient querying
CREATE INDEX idx_client_errors_agent_time ON client_errors(agent_id, created_at DESC);
CREATE INDEX idx_client_errors_subsystem_time ON client_errors(subsystem, created_at DESC);
CREATE INDEX idx_client_errors_error_type_time ON client_errors(error_type, created_at DESC);
CREATE INDEX idx_client_errors_created_at ON client_errors(created_at DESC);

-- Comments for documentation
COMMENT ON TABLE client_errors IS 'Frontend error logs for debugging and auditing. Implements ETHOS #1.';
COMMENT ON COLUMN client_errors.agent_id IS 'Agent active when error occurred (NULL for pre-auth errors)';
COMMENT ON COLUMN client_errors.subsystem IS 'RedFlag subsystem being used (storage, system, docker, etc.)';
COMMENT ON COLUMN client_errors.error_type IS 'Error category: javascript_error, api_error, ui_error, validation_error';
COMMENT ON COLUMN client_errors.metadata IS 'Additional context (component, API response, user actions)';

-- Add idempotency support to agent_commands
ALTER TABLE agent_commands ADD COLUMN idempotency_key UUID UNIQUE NULL;
CREATE INDEX idx_agent_commands_idempotency_key ON agent_commands(idempotency_key);
COMMENT ON COLUMN agent_commands.idempotency_key IS 'Prevents duplicate command creation from retry logic';

File: aggregator-server/internal/database/migrations/023_client_error_logging.down.sql

DROP TABLE IF EXISTS client_errors;
ALTER TABLE agent_commands DROP COLUMN IF EXISTS idempotency_key;

Time Required: 5 minutes

Phase 3: Backend Error Handler

Files to Create

3.1 Error Handler

File: aggregator-server/internal/api/handlers/client_errors.go

package handlers

import (
    "database/sql"
    "fmt"
    "log"
    "net/http"
    "time"

    "github.com/gin-gonic/gin"
    "github.com/google/uuid"
    "github.com/jmoiron/sqlx"
)

// ClientErrorHandler handles frontend error logging per ETHOS #1
type ClientErrorHandler struct {
    db *sqlx.DB
}

// NewClientErrorHandler creates a new error handler
func NewClientErrorHandler(db *sqlx.DB) *ClientErrorHandler {
    return &ClientErrorHandler{db: db}
}

// LogErrorRequest represents a client error log entry
type LogErrorRequest struct {
    Subsystem   string                 `json:"subsystem" binding:"required"`
    ErrorType   string                 `json:"error_type" binding:"required,oneof=javascript_error api_error ui_error validation_error"`
    Message     string                 `json:"message" binding:"required,max=10000"`
    StackTrace  string                 `json:"stack_trace,omitempty"`
    Metadata    map[string]interface{} `json:"metadata,omitempty"`
    URL         string                 `json:"url" binding:"required"`
}

// LogError processes and stores frontend errors
func (h *ClientErrorHandler) LogError(c *gin.Context) {
    var req LogErrorRequest
    if err := c.ShouldBindJSON(&req); err != nil {
        log.Printf("[ERROR] [server] [client_error] validation_failed error=\"%v\"", err)
        c.JSON(http.StatusBadRequest, gin.H{"error": "invalid request data"})
        return
    }

    // Extract agent ID from auth middleware if available
    var agentID interface{}
    if agentIDValue, exists := c.Get("agentID"); exists {
        if id, ok := agentIDValue.(uuid.UUID); ok {
            agentID = id
        }
    }

    // Log to console with HISTORY prefix
    log.Printf("[ERROR] [server] [client] [%s] agent_id=%v subsystem=%s message=\"%s\"",
        req.ErrorType, agentID, req.Subsystem, truncate(req.Message, 200))
    log.Printf("[HISTORY] [server] [client_error] agent_id=%v subsystem=%s type=%s url=\"%s\" message=\"%s\" timestamp=%s",
        agentID, req.Subsystem, req.ErrorType, req.URL, req.Message, time.Now().Format(time.RFC3339))

    // Store in database with retry logic
    if err := h.storeError(agentID, req); err != nil {
        log.Printf("[ERROR] [server] [client_error] store_failed error=\"%v\"", err)
        c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to store error"})
        return
    }

    c.JSON(http.StatusOK, gin.H{"logged": true})
}

// storeError persists error to database with retry
func (h *ClientErrorHandler) storeError(agentID interface{}, req LogErrorRequest) error {
    const maxRetries = 3
    var lastErr error

    for attempt := 1; attempt <= maxRetries; attempt++ {
        query := `INSERT INTO client_errors (agent_id, subsystem, error_type, message, stack_trace, metadata, url, user_agent)
                  VALUES (:agent_id, :subsystem, :error_type, :message, :stack_trace, :metadata, :url, :user_agent)`

        _, err := h.db.NamedExec(query, map[string]interface{}{
            "agent_id":   agentID,
            "subsystem":  req.Subsystem,
            "error_type": req.ErrorType,
            "message":    req.Message,
            "stack_trace": req.StackTrace,
            "metadata":   req.Metadata,
            "url":        req.URL,
            "user_agent": c.GetHeader("User-Agent"),
        })

        if err == nil {
            return nil
        }

        lastErr = err
        if attempt < maxRetries {
            time.Sleep(time.Duration(attempt) * time.Second)
            continue
        }
    }

    return fmt.Errorf("failed after %d attempts: %w", maxRetries, lastErr)
}

func truncate(s string, maxLen int) string {
    if len(s) <= maxLen {
        return s
    }
    return s[:maxLen] + "..."
}

func hash(s string) string {
    // Simple hash for message deduplication detection
    h := sha256.Sum256([]byte(s))
    return fmt.Sprintf("%x", h)[:16]
}

3.2 Query Client Errors

File: aggregator-server/internal/database/queries/client_errors.sql

-- name: GetClientErrorsByAgent :many
SELECT * FROM client_errors
WHERE agent_id = $1
ORDER BY created_at DESC
LIMIT $2;

-- name: GetClientErrorsBySubsystem :many
SELECT * FROM client_errors
WHERE subsystem = $1
ORDER BY created_at DESC
LIMIT $2;

-- name: GetClientErrorStats :one
SELECT
    subsystem,
    error_type,
    COUNT(*) as count,
    MIN(created_at) as first_occurrence,
    MAX(created_at) as last_occurrence
FROM client_errors
WHERE created_at > NOW() - INTERVAL '24 hours'
GROUP BY subsystem, error_type
ORDER BY count DESC;

3.3 Update Router

File: aggregator-server/internal/api/router.go

// Add to router setup function
func SetupRouter(db *sqlx.DB, cfg *config.Config) *gin.Engine {
    // ... existing setup ...

    // Error logging endpoint (authenticated)
    errorHandler := handlers.NewClientErrorHandler(db)
    apiV1.POST("/logs/client-error",
        middleware.AuthMiddleware(),
        errorHandler.LogError,
    )

    // Admin endpoint for viewing errors
    apiV1.GET("/logs/client-errors",
        middleware.AuthMiddleware(),
        middleware.AdminMiddleware(),
        errorHandler.GetErrors,
    )

    // ... rest of setup ...
}

Time Required: 20 minutes

Phase 4: Frontend Error Logger

Files to Create

4.1 Client Error Logger

File: aggregator-web/src/lib/client-error-logger.ts

import { api, ApiError } from './api';

export interface ClientErrorLog {
  subsystem: string;
  error_type: 'javascript_error' | 'api_error' | 'ui_error' | 'validation_error';
  message: string;
  stack_trace?: string;
  metadata?: Record<string, any>;
  url: string;
  timestamp: string;
}

/**
 * ClientErrorLogger provides reliable frontend error logging with retry logic
 * Implements ETHOS #3: Assume Failure; Build for Resilience
 */
export class ClientErrorLogger {
  private maxRetries = 3;
  private baseDelayMs = 1000;
  private localStorageKey = 'redflag-error-queue';
  private offlineBuffer: ClientErrorLog[] = [];
  private isOnline = navigator.onLine;

  constructor() {
    // Listen for online/offline events
    window.addEventListener('online', () => this.flushOfflineBuffer());
    window.addEventListener('offline', () => { this.isOnline = false; });
  }

  /**
   * Log an error with automatic retry and offline queuing
   */
  async logError(errorData: Omit<ClientErrorLog, 'url' | 'timestamp'>): Promise<void> {
    const fullError: ClientErrorLog = {
      ...errorData,
      url: window.location.href,
      timestamp: new Date().toISOString(),
    };

    // Try to send immediately
    try {
      await this.sendWithRetry(fullError);
      return;
    } catch (error) {
      // If failed after retries, queue for later
      this.queueForRetry(fullError);
    }
  }

  /**
   * Send error to backend with exponential backoff retry
   */
  private async sendWithRetry(error: ClientErrorLog): Promise<void> {
    for (let attempt = 1; attempt <= this.maxRetries; attempt++) {
      try {
        await api.post('/logs/client-error', error, {
          headers: { 'X-Error-Logger-Request': 'true' },
        });

        // Success, remove from queue if it was there
        this.removeFromQueue(error);
        return;
      } catch (error) {
        if (attempt === this.maxRetries) {
          throw error; // Rethrow after final attempt
        }

        // Exponential backoff
        await this.sleep(this.baseDelayMs * Math.pow(2, attempt - 1));
      }
    }
  }

  /**
   * Queue error for retry when network is available
   */
  private queueForRetry(error: ClientErrorLog): void {
    try {
      const queue = this.getQueue();
      queue.push({
        ...error,
        retryCount: (error as any).retryCount || 0,
        queuedAt: new Date().toISOString(),
      });

      // Save to localStorage for persistence
      localStorage.setItem(this.localStorageKey, JSON.stringify(queue));

      // Also keep in memory buffer
      this.offlineBuffer.push(error);
    } catch (storageError) {
      // localStorage might be full or unavailable
      console.warn('Failed to queue error for retry:', storageError);
    }
  }

  /**
   * Flush offline buffer when coming back online
   */
  private async flushOfflineBuffer(): Promise<void> {
    if (!this.isOnline) return;

    const queue = this.getQueue();
    if (queue.length === 0) return;

    const failed: typeof queue = [];

    for (const queuedError of queue) {
      try {
        await this.sendWithRetry(queuedError);
      } catch (error) {
        failed.push(queuedError);
      }
    }

    // Update queue with remaining failed items
    if (failed.length < queue.length) {
      localStorage.setItem(this.localStorageKey, JSON.stringify(failed));
    }
  }

  /**
   * Get current error queue from localStorage
   */
  private getQueue(): any[] {
    try {
      const stored = localStorage.getItem(this.localStorageKey);
      return stored ? JSON.parse(stored) : [];
    } catch {
      return [];
    }
  }

  /**
   * Remove successfully sent error from queue
   */
  private removeFromQueue(sentError: ClientErrorLog): void {
    try {
      const queue = this.getQueue();
      const filtered = queue.filter(queued =>
        queued.timestamp !== sentError.timestamp ||
        queued.message !== sentError.message
      );

      if (filtered.length < queue.length) {
        localStorage.setItem(this.localStorageKey, JSON.stringify(filtered));
      }
    } catch {
      // Best effort cleanup
    }
  }

  /**
   * Capture unhandled JavaScript errors
   */
  captureUnhandledErrors(): void {
    // Global error handler
    window.addEventListener('error', (event) => {
      this.logError({
        subsystem: 'global',
        error_type: 'javascript_error',
        message: event.message,
        stack_trace: event.error?.stack,
        metadata: {
          filename: event.filename,
          lineno: event.lineno,
          colno: event.colno,
        },
      }).catch(() => {
        // Silently ignore logging failures
      });
    });

    // Unhandled promise rejections
    window.addEventListener('unhandledrejection', (event) => {
      this.logError({
        subsystem: 'global',
        error_type: 'javascript_error',
        message: event.reason?.message || String(event.reason),
        stack_trace: event.reason?.stack,
      }).catch(() => {});
    });
  }

  private sleep(ms: number): Promise<void> {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

// Singleton instance
export const clientErrorLogger = new ClientErrorLogger();

// Auto-capture unhandled errors
if (typeof window !== 'undefined') {
  clientErrorLogger.captureUnhandledErrors();
}

4.2 Toast Wrapper with Logging

File: aggregator-web/src/lib/toast-with-logging.ts

import toast, { ToastOptions } from 'react-hot-toast';
import { clientErrorLogger } from './client-error-logger';
import { useLocation } from 'react-router-dom';

/**
 * Extract subsystem from current route
 */
function getCurrentSubsystem(): string {
  if (typeof window === 'undefined') return 'unknown';

  const path = window.location.pathname;

  // Map routes to subsystems
  if (path.includes('/storage')) return 'storage';
  if (path.includes('/system')) return 'system';
  if (path.includes('/docker')) return 'docker';
  if (path.includes('/updates')) return 'updates';
  if (path.includes('/agent/')) return 'agent';

  return 'unknown';
}

/**
 * Wrap toast.error to automatically log errors to backend
 * Implements ETHOS #1: Errors are History
 */
export const toastWithLogging = {
  error: (message: string, options?: ToastOptions & { subsystem?: string }) => {
    const subsystem = options?.subsystem || getCurrentSubsystem();

    // Log to backend asynchronously - don't block UI
    clientErrorLogger.logError({
      subsystem,
      error_type: 'ui_error',
      message: message.substring(0, 5000), // Prevent excessively long messages
      metadata: {
        component: options?.id,
        duration: options?.duration,
        position: options?.position,
        timestamp: new Date().toISOString(),
      },
    }).catch(() => {
      // Silently ignore logging failures - don't crash the UI
    });

    // Show toast to user
    return toast.error(message, options);
  },

  // Passthrough methods
  success: toast.success,
  info: toast.info,
  warning: toast.warning,
  loading: toast.loading,
  dismiss: toast.dismiss,
  remove: toast.remove,
  promise: toast.promise,
};

/**
 * React hook for toast with automatic subsystem detection
 */
export function useToastWithLogging() {
  const location = useLocation();

  return {
    error: (message: string, options?: ToastOptions) => {
      return toastWithLogging.error(message, {
        ...options,
        subsystem: getSubsystemFromPath(location.pathname),
      });
    },
    success: toast.success,
    info: toast.info,
    warning: toast.warning,
    loading: toast.loading,
    dismiss: toast.dismiss,
  };
}

function getSubsystemFromPath(pathname: string): string {
  const matches = pathname.match(/\/(storage|system|docker|updates|agent)/);
  return matches ? matches[1] : 'unknown';
}

4.3 API Integration

Update: aggregator-web/src/lib/api.ts

// Add error logging to axios interceptor
api.interceptors.response.use(
  (response) => response,
  async (error) => {
    // Don't log errors from the error logger itself
    if (error.config?.headers?.['X-Error-Logger-Request']) {
      return Promise.reject(error);
    }

    // Extract subsystem from URL
    const subsystem = extractSubsystem(error.config?.url);

    // Log API errors
    clientErrorLogger.logError({
      subsystem,
      error_type: 'api_error',
      message: error.message,
      metadata: {
        status_code: error.response?.status,
        endpoint: error.config?.url,
        method: error.config?.method,
        response_data: error.response?.data,
      },
    }).catch(() => {
      // Don't let logging errors hide the original error
    });

    return Promise.reject(error);
  }
);

function extractSubsystem(url: string = ''): string {
  const matches = url.match(/\/(storage|system|docker|updates|agent)/);
  return matches ? matches[1] : 'unknown';
}

Time Required: 20 minutes

Phase 5: Toast Integration

Update Existing Error Calls

Pattern: Update error toast calls to use new logger

Before:

import toast from 'react-hot-toast';

toast.error(`Failed to trigger scan: ${error.message}`);

After:

import { toastWithLogging } from '@/lib/toast-with-logging';

toastWithLogging.error(`Failed to trigger scan: ${error.message}`, {
  subsystem: 'storage', // Specify subsystem
  id: 'trigger-scan-error', // Optional component ID
});

Priority Files to Update

5.1 React State Management for Scan Buttons

File: Create aggregator-web/src/hooks/useScanState.ts

import { useState, useCallback } from 'react';
import { api } from '@/lib/api';
import { toastWithLogging } from '@/lib/toast-with-logging';

interface ScanState {
  isScanning: boolean;
  commandId?: string;
  error?: string;
}

/**
 * Hook for managing scan button state and preventing duplicate scans
 */
export function useScanState(agentId: string, subsystem: string) {
  const [state, setState] = useState<ScanState>({
    isScanning: false,
  });

  const triggerScan = useCallback(async () => {
    if (state.isScanning) {
      toastWithLogging.info('Scan already in progress', { subsystem });
      return;
    }

    setState({ isScanning: true, commandId: undefined, error: undefined });

    try {
      const result = await api.post(`/agents/${agentId}/subsystems/${subsystem}/trigger`);

      setState(prev => ({
        ...prev,
        commandId: result.data.command_id,
      }));

      // Poll for completion or wait for subscription update
      await waitForScanComplete(agentId, result.data.command_id);

      setState({ isScanning: false, commandId: result.data.command_id });

      toastWithLogging.success(`${subsystem} scan completed`, { subsystem });
    } catch (error: any) {
      const isAlreadyRunning = error.response?.status === 409;

      if (isAlreadyRunning) {
        const existingCommandId = error.response?.data?.command_id;
        setState({
          isScanning: false,
          commandId: existingCommandId,
          error: 'Scan already in progress',
        });

        toastWithLogging.info(`Scan already running (command: ${existingCommandId})`, { subsystem });
      } else {
        const errorMessage = error.response?.data?.error || error.message;
        setState({
          isScanning: false,
          error: errorMessage,
        });

        toastWithLogging.error(`Failed to trigger scan: ${errorMessage}`, { subsystem });
      }
    }
  }, [agentId, subsystem, state.isScanning]);

  const reset = useCallback(() => {
    setState({ isScanning: false, commandId: undefined, error: undefined });
  }, []);

  return {
    isScanning: state.isScanning,
    commandId: state.commandId,
    error: state.error,
    triggerScan,
    reset,
  };
}

/**
 * Wait for scan to complete by polling command status
 */
async function waitForScanComplete(agentId: string, commandId: string): Promise<void> {
  const maxWaitMs = 300000; // 5 minutes max
  const startTime = Date.now();
  const pollInterval = 2000; // Poll every 2 seconds

  return new Promise((resolve, reject) => {
    const interval = setInterval(async () => {
      try {
        const result = await api.get(`/agents/${agentId}/commands/${commandId}`);

        if (result.data.status === 'completed' || result.data.status === 'failed') {
          clearInterval(interval);
          resolve();
        }
      } catch (error) {
        clearInterval(interval);
        reject(error);
      }

      if (Date.now() - startTime > maxWaitMs) {
        clearInterval(interval);
        reject(new Error('Scan timeout'));
      }
    }, pollInterval);
  });
}

Usage Example in Component:

import { useScanState } from '@/hooks/useScanState';

function ScanButton({ agentId, subsystem }: { agentId: string; subsystem: string }) {
  const { isScanning, triggerScan } = useScanState(agentId, subsystem);

  return (
    <button
      onClick={triggerScan}
      disabled={isScanning}
      className={isScanning ? 'btn-disabled' : 'btn-primary'}
    >
      {isScanning ? (
        <>
          <Spinner className="animate-spin" />
          Scanning...
        </>
      ) : (
        `Scan ${subsystem}`
      )}
    </button>
  );
}

5.2 Update Existing Error Calls

Priority Files to Update

Agent Subsystem Actions - /src/components/AgentSubsystems.tsx
Command Retry Logic - /src/hooks/useCommands.ts
Authentication Errors - /src/lib/auth.ts
API Error Boundaries - /src/components/ErrorBoundary.tsx

Example Complete Integration

File: aggregator-web/src/components/AgentSubsystems.tsx (example update)

import { toastWithLogging } from '@/lib/toast-with-logging';

const handleTrigger = async (subsystem: string) => {
  try {
    await triggerSubsystem(agentId, subsystem);
  } catch (error) {
    toastWithLogging.error(
      `Failed to trigger ${subsystem} scan: ${error.message}`,
      {
        subsystem,
        id: `trigger-${subsystem}`,
      }
    );
  }
};

Time Required: 15 minutes

5.3 Deduplication Testing

Test Cases:

// Test 1: Rapid clicking prevention
test('clicking scan button 10 times creates only 1 command', async () => {
  const button = screen.getByText('Scan APT');

  // Click 10 times rapidly
  for (let i = 0; i < 10; i++) {
    fireEvent.click(button);
  }

  // Should only create 1 command
  expect(api.post).toHaveBeenCalledTimes(1);
  expect(api.post).toHaveBeenCalledWith('/agents/123/subsystems/apt/trigger');
});

// Test 2: Button disabled while scanning
test('button disabled during scan', async () => {
  const button = screen.getByText('Scan APT');

  fireEvent.click(button);

  // Button should be disabled immediately
  expect(button).toBeDisabled();
  expect(screen.getByText('Scanning...')).toBeInTheDocument();

  await waitFor(() => {
    expect(button).not.toBeDisabled();
  });
});

// Test 3: 409 Conflict returns existing command
mock.onPost().reply(409, {
  error: 'Scan already in progress',
  command_id: 'existing-id',
});

expect(await triggerScan()).toEqual({ command_id: 'existing-id' });
expect(toast).toHaveBeenCalledWith('Scan already running');

Phase 6: Verification & Testing

Manual Testing Checklist

6.1 Migration Testing

Run migration 023 successfully
Verify client_errors table exists
Verify idempotency_key column added to agent_commands
Test on fresh database (no duplicate key errors)

6.2 Command Factory Testing

Rapid-fire scan button clicks (10+ clicks in 2 seconds)
Verify all commands created with unique IDs
Check no duplicate key violations in logs
Verify commands appear in database correctly

6.3 Error Logging Testing

Trigger UI error (e.g., invalid input)
Verify error appears in toast
Check database - error should be stored in client_errors
Trigger API error (e.g., network timeout)
Verify exponential backoff retry works
Disconnect network, trigger error, reconnect
Verify error is queued and sent when back online

6.4 Integration Testing

Full user workflow: login → trigger scan → view results
Verify all errors logged with [HISTORY] prefix
Check logs are queryable by subsystem
Verify error logging doesn't block UI

Automated Test Cases

6.5 Backend Tests

File: aggregator-server/internal/command/factory_test.go

package command

import (
    "testing"

    "github.com/google/uuid"
    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/require"
)

func TestFactory_Create(t *testing.T) {
    factory := NewFactory()
    agentID := uuid.New()

    cmd, err := factory.Create(agentID, "scan_storage", map[string]interface{}{"path": "/"})

    require.NoError(t, err)
    assert.NotEqual(t, uuid.Nil, cmd.ID, "ID must be generated")
    assert.Equal(t, agentID, cmd.AgentID)
    assert.Equal(t, "scan_storage", cmd.CommandType)
    assert.Equal(t, "pending", cmd.Status)
    assert.Equal(t, "manual", cmd.Source)
}

func TestFactory_CreateWithIdempotency(t *testing.T) {
    factory := NewFactory()
    agentID := uuid.New()
    idempotencyKey := "test-key-123"

    // Create first command
    cmd1, err := factory.CreateWithIdempotency(agentID, "scan_system", nil, idempotencyKey)
    require.NoError(t, err)

    // Create "duplicate" command with same idempotency key
    cmd2, err := factory.CreateWithIdempotency(agentID, "scan_system", nil, idempotencyKey)
    require.NoError(t, err)

    // Should return same command
    assert.Equal(t, cmd1.ID, cmd2.ID, "Idempotency key should return same command")
}

func TestFactory_Validate(t *testing.T) {
    tests := []struct {
        name    string
        cmd     *models.AgentCommand
        wantErr bool
    }{
        {
            name: "valid command",
            cmd: &models.AgentCommand{
                ID:          uuid.New(),
                AgentID:     uuid.New(),
                CommandType: "scan_storage",
                Status:      "pending",
                Source:      "manual",
            },
            wantErr: false,
        },
        {
            name: "missing ID",
            cmd: &models.AgentCommand{
                ID:          uuid.Nil,
                AgentID:     uuid.New(),
                CommandType: "scan_storage",
                Status:      "pending",
                Source:      "manual",
            },
            wantErr: true,
        },
    }

    factory := NewFactory()

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            _, err := factory.Create(tt.cmd.AgentID, tt.cmd.CommandType, nil)
            if tt.wantErr {
                assert.Error(t, err)
            } else {
                assert.NoError(t, err)
            }
        })
    }
}

6.6 Frontend Tests

File: aggregator-web/src/lib/client-error-logger.test.ts

import { clientErrorLogger } from './client-error-logger';
import { api } from './api';

jest.mock('./api');

describe('ClientErrorLogger', () => {
  beforeEach(() => {
    localStorage.clear();
    jest.clearAllMocks();
  });

  test('logs error successfully on first attempt', async () => {
    (api.post as jest.Mock).mockResolvedValue({});

    await clientErrorLogger.logError({
      subsystem: 'storage',
      error_type: 'api_error',
      message: 'Test error',
    });

    expect(api.post).toHaveBeenCalledTimes(1);
    expect(api.post).toHaveBeenCalledWith(
      '/logs/client-error',
      expect.objectContaining({
        subsystem: 'storage',
        error_type: 'api_error',
        message: 'Test error',
      }),
      expect.any(Object)
    );
  });

  test('retries on failure then saves to localStorage', async () => {
    (api.post as jest.Mock)
      .mockRejectedValueOnce(new Error('Network error'))
      .mockRejectedValueOnce(new Error('Network error'))
      .mockRejectedValueOnce(new Error('Network error'));

    await clientErrorLogger.logError({
      subsystem: 'storage',
      error_type: 'api_error',
      message: 'Test error',
    });

    expect(api.post).toHaveBeenCalledTimes(3);

    // Should be saved to localStorage
    const queue = localStorage.getItem('redflag-error-queue');
    expect(queue).toBeTruthy();
    expect(JSON.parse(queue!).length).toBe(1);
  });

  test('flushes queue when coming back online', async () => {
    // Pre-populate queue
    const queuedError = {
      subsystem: 'storage',
      error_type: 'api_error',
      message: 'Queued error',
      timestamp: new Date().toISOString(),
    };
    localStorage.setItem('redflag-error-queue', JSON.stringify([queuedError]));

    (api.post as jest.Mock).mockResolvedValue({});

    // Trigger online event
    window.dispatchEvent(new Event('online'));

    // Wait for flush
    await new Promise(resolve => setTimeout(resolve, 100));

    expect(api.post).toHaveBeenCalled();
    expect(localStorage.getItem('redflag-error-queue')).toBe('[]');
  });
});

Time Required: 30 minutes

Implementation Checklist

Pre-Implementation

✅ Migration system bug fixed (lines 103 & 116 in db.go)
✅ Database wiped and fresh instance ready
✅ Test agents available for rapid scan testing
✅ Development environment ready (all 3 components)

Phase 1: Command Factory (25 min)

Create aggregator-server/internal/command/factory.go
Create aggregator-server/internal/command/validator.go
Update aggregator-server/internal/models/command.go
Update aggregator-server/internal/api/handlers/subsystems.go
Test: Verify rapid scan clicks work

Phase 2: Database Schema (5 min)

Create migration 023_client_error_logging.up.sql
Create migration 023_client_error_logging.down.sql
Run migration and verify table creation
Verify indexes created

Phase 3: Backend Handler (20 min)

Create aggregator-server/internal/api/handlers/client_errors.go
Create aggregator-server/internal/database/queries/client_errors.sql
Update aggregator-server/internal/api/router.go
Test API endpoint with curl

Phase 4: Frontend Logger (20 min)

Create aggregator-web/src/lib/client-error-logger.ts
Create aggregator-web/src/lib/toast-with-logging.ts
Update aggregator-web/src/lib/api.ts
Test offline/online queue behavior

Phase 5: Toast Integration (15 min)

Create useScanState hook for button state management
Update scan buttons to use useScanState
Test button disabling during scan
Update 3-5 critical error locations to use toastWithLogging
Verify errors appear in both toast and database
Test in multiple subsystems
Test deduplication: Rapid clicking creates only 1 command
Test 409 response: Returns existing command when scan running

Phase 6: Verification (30 min)

Run all test cases
Verify ETHOS compliance checklist
Test rapid scan clicking (no duplicates)
Test error persistence across page reloads
Verify [HISTORY] logs in server output

Documentation

Update session documentation
Create testing summary
Document any issues encountered
Update architecture documentation

Risk Mitigation

Risk 1: Migration Failures

Probability: Medium | Impact: High | Severity: 🔴 Critical

Mitigation:

Fix migration runner bug FIRST (before this implementation)
Test migration on fresh database
Keep database backups
Have rollback script ready

Contingency: If migration fails, manually apply SQL and continue

Risk 2: Performance Impact

Probability: Low | Impact: Medium | Severity: 🟡 Medium

Mitigation:

Async error logging (non-blocking)
LocalStorage queue with size limit (max 50 errors)
Database indexes for fast queries
Batch insert if needed in future

Contingency: If performance degrades, add sampling (log 1 in 10 errors)

Risk 3: Infinite Error Loops

Probability: Low | Impact: High | Severity: 🟡 Medium

Mitigation:

X-Error-Logger-Request header prevents recursive logging
Max retry count (3 attempts)
Exponential backoff prevents thundering herd

Contingency: If loop detected, check for missing header and fix

Risk 4: Privacy Concerns

Probability: Low | Impact: High | Severity: 🟡 Medium

Mitigation:

No PII in error messages (validate during logging)
User agent stored but can be anonymized
Stack traces only from our code (not user code)

Contingency: Add privacy filter to scrub sensitive data

Post-Implementation Review

Success Criteria

No duplicate key violations during rapid clicking
All errors persist in database
Error logs queryable and useful for debugging
No performance degradation observed
System handles offline/online transitions gracefully
All tests pass

Performance Benchmarks

Command creation: < 10ms per command
Error logging: < 50ms per error (async)
Database queries: < 100ms for common queries
Bundle size increase: < 5KB gzipped

Known Limitations

Error logs don't include full request payloads (privacy)
localStorage queue limited by browser storage (~5MB)
Retries happen in foreground (could be moved to background)

Future Enhancements (Post v0.1.27)

Error aggregation and deduplication
Error rate alerting
Error analytics dashboard
Automatic error categorization
Integration with notification system

Rollback Plan

If critical issues arise:

Revert Code Changes:

git revert HEAD~6..HEAD  # Revert last 6 commits

Rollback Database:

cd aggregator-server
# Run down migration
go run cmd/migrate/main.go -migrate-down 1

Rebuild and Deploy:

docker-compose build --no-cache
docker-compose up -d

Additional Notes

Team Coordination:

Coordinate with frontend team if they're working on error handling
Notify QA about new error logging features for testing
Update documentation team about database schema changes

Monitoring:

Monitor client_errors table growth
Set up alerts for error rate spikes
Track failed error logging attempts

Documentation Updates:

Update API documentation for /logs/client-error endpoint
Document error log query patterns for support team
Add troubleshooting guide for common errors

Plan Created By: Ani (AI Assistant) Reviewed By: Casey Tunturi Status: 🟢 APPROVED FOR IMPLEMENTATION Next Step: Begin Phase 1 (Command Factory)

Estimated Timeline:

Start: Immediately
Complete: ~2-3 hours
Test: 30 minutes
Deploy: After verification

This is a complete, production-ready implementation plan. Each phase builds on the previous one, with full error handling, testing, and rollback procedures included.

Let's build this right. 💪

53 KiB Raw Blame History

RedFlag Clean Architecture Implementation Master Plan

Executive Summary

Table of Contents

Pre-Implementation: Migration System Fix

Problem

Root Cause

Solution

Validation Steps

Time Required: 5 minutes

Phase 1: Command Factory Pattern

Objective

Files to Create

1.1 Command Factory

1.2 Command Validator

1.3 Update AgentCommand Model

1.4 Update Subsystem Handler

Time Required: 30 minutes

Phase 2: Database Schema

Migration 023a: Command Deduplication

Migration 023: Client Error Logging Table

Time Required: 5 minutes

Phase 3: Backend Error Handler

Files to Create

3.1 Error Handler

3.2 Query Client Errors

3.3 Update Router

Time Required: 20 minutes

Phase 4: Frontend Error Logger

Files to Create

4.1 Client Error Logger

4.2 Toast Wrapper with Logging

4.3 API Integration

Time Required: 20 minutes

Phase 5: Toast Integration

Update Existing Error Calls

Priority Files to Update

5.1 React State Management for Scan Buttons

5.2 Update Existing Error Calls

Example Complete Integration

Time Required: 15 minutes

5.3 Deduplication Testing

Phase 6: Verification & Testing

Manual Testing Checklist

6.1 Migration Testing

6.2 Command Factory Testing

6.3 Error Logging Testing

6.4 Integration Testing

Automated Test Cases

6.5 Backend Tests

6.6 Frontend Tests

Time Required: 30 minutes

Implementation Checklist

Pre-Implementation

Phase 1: Command Factory (25 min)

Phase 2: Database Schema (5 min)

Phase 3: Backend Handler (20 min)

Phase 4: Frontend Logger (20 min)

Phase 5: Toast Integration (15 min)

Phase 6: Verification (30 min)

Documentation

Risk Mitigation

Risk 1: Migration Failures

Risk 2: Performance Impact

Risk 3: Infinite Error Loops

Risk 4: Privacy Concerns

Post-Implementation Review

Success Criteria

Performance Benchmarks

Known Limitations

Future Enhancements (Post v0.1.27)

Rollback Plan

Additional Notes

53 KiB

Raw Blame History