Redflag/IMPLEMENTATION_PLAN_CLEAN_ARCHITECTURE.md

# RedFlag Clean Architecture Implementation Master Plan

**Date**: 2025-12-19
**Version**: v1.0
**Total Implementation Time**: 3-4 hours (including migration fixes and command deduplication)
**Status**: READY FOR EXECUTION

---

## Executive Summary

Complete implementation plan for fixing critical ETHOS violations and implementing clean architecture patterns across RedFlag v0.1.27. Addresses duplicate command generation, lost frontend errors, and migration system bugs.

**Three Core Objectives:**
1. ✅ Fix migration system (blocks everything else)
2. ✅ Implement command factory pattern (prevents duplicate key violations)
3. ✅ Build frontend error logging system (ETHOS #1 compliance)

---

## Table of Contents

1. [Pre-Implementation: Migration System Fix](#pre-implementation-migration-system-fix)
2. [Phase 1: Command Factory Pattern](#phase-1-command-factory-pattern)
3. [Phase 2: Database Schema](#phase-2-database-schema)
4. [Phase 3: Backend Error Handler](#phase-3-backend-error-handler)
5. [Phase 4: Frontend Error Logger](#phase-4-frontend-error-logger)
6. [Phase 5: Toast Integration](#phase-5-toast-integration)
7. [Phase 6: Verification & Testing](#phase-6-verification-and-testing)
8. [Implementation Checklist](#implementation-checklist)
9. [Risk Mitigation](#risk-mitigation)
10. [Post-Implementation Review](#post-implementation-review)

---

## Pre-Implementation: Migration System Fix

**⚠️ CRITICAL: Must be completed first - blocks all other work**

### Problem
Migration runner has duplicate INSERT logic causing "duplicate key value violates unique constraint" errors on fresh installations.

### Root Cause
File: `aggregator-server/internal/database/db.go`
- Line 103: Executes `INSERT INTO schema_migrations (version) VALUES ($1)`
- Line 116: Executes the exact same INSERT statement
- Result: Every migration filename gets inserted twice

### Solution
```go
// File: aggregator-server/internal/database/db.go

// Lines 95-120: Fix duplicate INSERT logic
func (db *DB) Migrate() error {
    // ... existing code ...

    for _, file := range files {
        filename := file.Name()
        // ❌ REMOVE THIS - Line 103 duplicates line 116
        // if _, err = tx.Exec("INSERT INTO schema_migrations (version) VALUES ($1)", filename); err != nil {
        //     return fmt.Errorf("failed to mark migration %s as applied: %w", filename, err)
        // }

        // Keep only the EXECUTE + INSERT combo at lines 110-116
        if _, err = tx.Exec(string(content)); err != nil {
            log.Printf("Migration %s failed, marking as applied: %v", filename, err)
        }

        // ✅ Keep this INSERT - it's the correct location
        if _, err = tx.Exec("INSERT INTO schema_migrations (version) VALUES ($1)", filename); err != nil {
            return fmt.Errorf("failed to mark migration %s as applied: %w", filename, err)
        }
    }

    // ... rest of function ...
}
```

### Validation Steps
1. Wipe database completely: `docker-compose down -v`
2. Start fresh: `docker-compose up -d`
3. Check migration logs: Should see all migrations apply without duplicate key errors
4. Verify: `SELECT COUNT(DISTINCT version) = COUNT(version) FROM schema_migrations`

### Time Required: 5 minutes
**Blocker Status**: 🔴 CRITICAL - Do not proceed until fixed

---

## Phase 1: Command Factory Pattern

### Objective
Prevent duplicate command key violations by ensuring all commands have properly generated UUIDs at creation time.

### Files to Create

#### 1.1 Command Factory
**File**: `aggregator-server/internal/command/factory.go`
```go
package command

import (
    "errors"
    "fmt"
    "time"

    "github.com/google/uuid"
    "github.com/Fimeg/RedFlag/aggregator-server/internal/models"
)

// Factory creates validated AgentCommand instances
type Factory struct {
    validator *Validator
}

// NewFactory creates a new command factory
func NewFactory() *Factory {
    return &Factory{
        validator: NewValidator(),
    }
}

// Create generates a new validated AgentCommand with unique ID
func (f *Factory) Create(agentID uuid.UUID, commandType string, params map[string]interface{}) (*models.AgentCommand, error) {
    cmd := &models.AgentCommand{
        ID:          uuid.New(), // Immediate, explicit generation
        AgentID:     agentID,
        CommandType: commandType,
        Status:      "pending",
        Source:      determineSource(commandType),
        Params:      params,
        CreatedAt:   time.Now(),
        UpdatedAt:   time.Now(),
    }

    if err := f.validator.Validate(cmd); err != nil {
        return nil, fmt.Errorf("command validation failed: %w", err)
    }

    return cmd, nil
}

// CreateWithIdempotency generates a command with idempotency protection
func (f *Factory) CreateWithIdempotency(agentID uuid.UUID, commandType string,
    params map[string]interface{}, idempotencyKey string) (*models.AgentCommand, error) {

    // Check for existing command with same idempotency key
    // (Implementation depends on database query layer)
    existing, err := f.findByIdempotencyKey(agentID, idempotencyKey)
    if err != nil && err != sql.ErrNoRows {
        return nil, fmt.Errorf("failed to check idempotency: %w", err)
    }

    if existing != nil {
        return existing, nil // Return existing command instead of creating duplicate
    }

    cmd, err := f.Create(agentID, commandType, params)
    if err != nil {
        return nil, err
    }

    // Store idempotency key with command
    if err := f.storeIdempotencyKey(cmd.ID, agentID, idempotencyKey); err != nil {
        return nil, fmt.Errorf("failed to store idempotency key: %w", err)
    }

    return cmd, nil
}

// determineSource classifies command source based on type
func determineSource(commandType string) string {
    if isSystemCommand(commandType) {
        return "system"
    }
    return "manual"
}

func isSystemCommand(commandType string) bool {
    systemCommands := []string{
        "enable_heartbeat",
        "disable_heartbeat",
        "update_check",
        "cleanup_old_logs",
    }

    for _, cmd := range systemCommands {
        if commandType == cmd {
            return true
        }
    }
    return false
}
```

#### 1.2 Command Validator
**File**: `aggregator-server/internal/command/validator.go`
```go
package command

import (
    "errors"
    "fmt"

    "github.com/google/uuid"
    "github.com/Fimeg/RedFlag/aggregator-server/internal/models"
)

// Validator validates command parameters
type Validator struct {
    minCheckInSeconds int
    maxCheckInSeconds int
    minScannerMinutes int
    maxScannerMinutes int
}

// NewValidator creates a new command validator
func NewValidator() *Validator {
    return &Validator{
        minCheckInSeconds: 60,   // 1 minute minimum
        maxCheckInSeconds: 3600, // 1 hour maximum
        minScannerMinutes: 1,    // 1 minute minimum
        maxScannerMinutes: 1440, // 24 hours maximum
    }
}

// Validate performs comprehensive command validation
func (v *Validator) Validate(cmd *models.AgentCommand) error {
    if cmd == nil {
        return errors.New("command cannot be nil")
    }

    if cmd.ID == uuid.Nil {
        return errors.New("command ID cannot be zero UUID")
    }

    if cmd.AgentID == uuid.Nil {
        return errors.New("agent ID is required")
    }

    if cmd.CommandType == "" {
        return errors.New("command type is required")
    }

    if cmd.Status == "" {
        return errors.New("status is required")
    }

    validStatuses := []string{"pending", "running", "completed", "failed", "cancelled"}
    if !contains(validStatuses, cmd.Status) {
        return fmt.Errorf("invalid status: %s", cmd.Status)
    }

    if cmd.Source != "manual" && cmd.Source != "system" {
        return fmt.Errorf("source must be 'manual' or 'system', got: %s", cmd.Source)
    }

    // Validate command type format
    if err := v.validateCommandType(cmd.CommandType); err != nil {
        return err
    }

    return nil
}

// ValidateSubsystemAction validates subsystem-specific actions
func (v *Validator) ValidateSubsystemAction(subsystem string, action string) error {
    validActions := map[string][]string{
        "storage": {"trigger", "enable", "disable", "set_interval"},
        "system":  {"trigger", "enable", "disable", "set_interval"},
        "docker":  {"trigger", "enable", "disable", "set_interval"},
        "updates": {"trigger", "enable", "disable", "set_interval"},
    }

    actions, ok := validActions[subsystem]
    if !ok {
        return fmt.Errorf("unknown subsystem: %s", subsystem)
    }

    if !contains(actions, action) {
        return fmt.Errorf("invalid action '%s' for subsystem '%s'", action, subsystem)
    }

    return nil
}

// ValidateInterval ensures scanner intervals are within bounds
func (v *Validator) ValidateInterval(subsystem string, minutes int) error {
    if minutes < v.minScannerMinutes {
        return fmt.Errorf("interval %d minutes below minimum %d for subsystem %s",
            minutes, v.minScannerMinutes, subsystem)
    }

    if minutes > v.maxScannerMinutes {
        return fmt.Errorf("interval %d minutes above maximum %d for subsystem %s",
            minutes, v.maxScannerMinutes, subsystem)
    }

    return nil
}

func (v *Validator) validateCommandType(commandType string) error {
    validPrefixes := []string{"scan_", "install_", "update_", "enable_", "disable_", "reboot"}

    for _, prefix := range validPrefixes {
        if len(commandType) >= len(prefix) && commandType[:len(prefix)] == prefix {
            return nil
        }
    }

    return fmt.Errorf("invalid command type format: %s", commandType)
}

func contains(slice []string, item string) bool {
    for _, s := range slice {
        if s == item {
            return true
        }
    }
    return false
}
```

#### 1.3 Update AgentCommand Model
**File**: `aggregator-server/internal/models/command.go`
```go
package models

import (
    "database/sql"
    "time"

    "github.com/google/uuid"
    "github.com/lib/pq"
)

// AgentCommand represents a command sent to an agent
type AgentCommand struct {
    ID            uuid.UUID       `db:"id" json:"id"`
    AgentID       uuid.UUID       `db:"agent_id" json:"agent_id"`
    CommandType   string          `db:"command_type" json:"command_type"`
    Status        string          `db:"status" json:"status"`
    Source        string          `db:"source" json:"source"`
    Params        pq.ByteaArray   `db:"params" json:"params"`
    Result        sql.NullString  `db:"result" json:"result,omitempty"`
    Error         sql.NullString  `db:"error" json:"error,omitempty"`
    RetryCount    int             `db:"retry_count" json:"retry_count"`
    CreatedAt     time.Time       `db:"created_at" json:"created_at"`
    UpdatedAt     time.Time       `db:"updated_at" json:"updated_at"`
    CompletedAt   pq.NullTime     `db:"completed_at" json:"completed_at,omitempty"`

    // Idempotency support
    IdempotencyKey uuid.NullUUID  `db:"idempotency_key" json:"-"`
}

// Validate checks if the command is valid
func (c *AgentCommand) Validate() error {
    if c.ID == uuid.Nil {
        return ErrCommandIDRequired
    }
    if c.AgentID == uuid.Nil {
        return ErrAgentIDRequired
    }
    if c.CommandType == "" {
        return ErrCommandTypeRequired
    }
    if c.Status == "" {
        return ErrStatusRequired
    }
    if c.Source != "manual" && c.Source != "system" {
        return ErrInvalidSource
    }

    return nil
}

// IsTerminal returns true if the command is in a terminal state
func (c *AgentCommand) IsTerminal() bool {
    return c.Status == "completed" || c.Status == "failed" || c.Status == "cancelled"
}

// CanRetry returns true if the command can be retried
func (c *AgentCommand) CanRetry() bool {
    return c.Status == "failed" && c.RetryCount < 3
}

// Predefined errors for validation
var (
    ErrCommandIDRequired = errors.New("command ID cannot be zero UUID")
    ErrAgentIDRequired   = errors.New("agent ID is required")
    ErrCommandTypeRequired = errors.New("command type is required")
    ErrStatusRequired    = errors.New("status is required")
    ErrInvalidSource     = errors.New("source must be 'manual' or 'system'")
)
```

#### 1.4 Update Subsystem Handler
**File**: `aggregator-server/internal/api/handlers/subsystems.go`
```go
package handlers

import (
    "log"
    "net/http"

    "github.com/gin-gonic/gin"
    "github.com/google/uuid"
    "github.com/jmoiron/sqlx"

    "github.com/Fimeg/RedFlag/aggregator-server/internal/command"
    "github.com/Fimeg/RedFlag/aggregator-server/internal/models"
)

type SubsystemHandler struct {
    db              *sqlx.DB
    commandFactory  *command.Factory
}

func NewSubsystemHandler(db *sqlx.DB) *SubsystemHandler {
    return &SubsystemHandler{
        db:             db,
        commandFactory: command.NewFactory(),
    }
}

// TriggerSubsystem creates and enqueues a subsystem command
func (h *SubsystemHandler) TriggerSubsystem(c *gin.Context) {
    agentID, err := uuid.Parse(c.Param("id"))
    if err != nil {
        log.Printf("[ERROR] [server] [subsystem] invalid_agent_id error=%v", err)
        c.JSON(http.StatusBadRequest, gin.H{"error": "invalid agent ID"})
        return
    }

    subsystem := c.Param("subsystem")
    if err := h.validateSubsystem(subsystem); err != nil {
        log.Printf("[ERROR] [server] [subsystem] invalid_subsystem subsystem=%s error=%v", subsystem, err)
        c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
        return
    }

    // DEDUPLICATION CHECK: Prevent multiple pending scans
    existingCmd, err := h.getPendingScanCommand(agentID, subsystem)
    if err != nil {
        log.Printf("[ERROR] [server] [subsystem] query_failed agent_id=%s subsystem=%s error=%v",
            agentID, subsystem, err)
        c.JSON(http.StatusInternalServerError, gin.H{"error": "internal error"})
        return
    }

    if existingCmd != nil {
        log.Printf("[INFO] [server] [subsystem] scan_already_pending agent_id=%s subsystem=%s command_id=%s",
            agentID, subsystem, existingCmd.ID)
        log.Printf("[HISTORY] [server] [scan_%s] duplicate_request_prevented agent_id=%s command_id=%s timestamp=%s",
            subsystem, agentID, existingCmd.ID, time.Now().Format(time.RFC3339))

        c.JSON(http.StatusConflict, gin.H{
            "error": "Scan already in progress",
            "command_id": existingCmd.ID.String(),
            "subsystem": subsystem,
            "status": existingCmd.Status,
            "created_at": existingCmd.CreatedAt,
        })
        return
    }

    // Generate idempotency key from request context
    idempotencyKey := h.generateIdempotencyKey(c, agentID, subsystem)

    // Create command using factory
    cmd, err := h.commandFactory.CreateWithIdempotency(
        agentID,
        "scan_"+subsystem,
        map[string]interface{}{"subsystem": subsystem},
        idempotencyKey,
    )
    if err != nil {
        log.Printf("[ERROR] [server] [subsystem] command_creation_failed agent_id=%s subsystem=%s error=%v",
            agentID, subsystem, err)
        c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to create command"})
        return
    }

    // Store command in database
    if err := h.storeCommand(cmd); err != nil {
        log.Printf("[ERROR] [server] [subsystem] command_store_failed agent_id=%s command_id=%s error=%v",
            agentID, cmd.ID, err)
        c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to store command"})
        return
    }

    log.Printf("[INFO] [server] [subsystem] command_created agent_id=%s command_id=%s subsystem=%s",
        agentID, cmd.ID, subsystem)
    log.Printf("[HISTORY] [server] [scan_%s] command_created agent_id=%s command_id=%s source=manual timestamp=%s",
        subsystem, agentID, cmd.ID, time.Now().Format(time.RFC3339))

    c.JSON(http.StatusOK, gin.H{
        "message": "Command created successfully",
        "command_id": cmd.ID.String(),
        "subsystem": subsystem,
    })
}

// getPendingScanCommand checks for existing pending scan commands
func (h *SubsystemHandler) getPendingScanCommand(agentID uuid.UUID, subsystem string) (*models.AgentCommand, error) {
    var cmd models.AgentCommand
    query := `
        SELECT id, command_type, status, created_at
        FROM agent_commands
        WHERE agent_id = $1
          AND command_type = $2
          AND status = 'pending'
        LIMIT 1`

    commandType := "scan_" + subsystem
    err := h.db.Get(&cmd, query, agentID, commandType)
    if err != nil {
        if err == sql.ErrNoRows {
            return nil, nil // No pending command found
        }
        return nil, fmt.Errorf("query failed: %w", err)
    }

    return &cmd, nil
}

// validateSubsystem checks if subsystem is recognized
func (h *SubsystemHandler) validateSubsystem(subsystem string) error {
    validSubsystems := []string{"apt", "dnf", "windows", "winget", "storage", "system", "docker"}
    for _, valid := range validSubsystems {
        if subsystem == valid {
            return nil
        }
    }
    return fmt.Errorf("unknown subsystem: %s", subsystem)
}

// generateIdempotencyKey creates a key to prevent duplicate submissions
func (h *SubsystemHandler) generateIdempotencyKey(c *gin.Context, agentID uuid.UUID, subsystem string) string {
    // Use timestamp rounded to nearest minute for idempotency window
    // This allows retries within same minute but prevents duplicates across minutes
    timestampWindow := time.Now().Unix() / 60 // Round to minute
    return fmt.Sprintf("%s:%s:%d", agentID.String(), subsystem, timestampWindow)
}

// storeCommand persists command to database
func (h *SubsystemHandler) storeCommand(cmd *models.AgentCommand) error {
    // Implementation depends on your command storage layer
    // Use NamedExec or similar to insert command
    query := `
        INSERT INTO agent_commands
        (id, agent_id, command_type, status, source, params, created_at)
        VALUES (:id, :agent_id, :command_type, :status, :source, :params, NOW())`

    _, err := h.db.NamedExec(query, cmd)
    return err
}
```

### Time Required: 30 minutes

---

## Phase 2: Database Schema

### Migration 023a: Command Deduplication
**File**: `aggregator-server/internal/database/migrations/023a_command_deduplication.up.sql`
```sql
-- Command Deduplication Schema
-- Prevents multiple pending scan commands per subsystem per agent

-- Add unique constraint to enforce single pending command per subsystem
CREATE UNIQUE INDEX idx_agent_pending_subsystem
ON agent_commands(agent_id, command_type, status)
WHERE status = 'pending';

-- Add idempotency key support for retry scenarios
ALTER TABLE agent_commands ADD COLUMN idempotency_key VARCHAR(64) UNIQUE NULL;
CREATE INDEX idx_agent_commands_idempotency_key ON agent_commands(idempotency_key);

COMMENT ON COLUMN agent_commands.idempotency_key IS
  'Prevents duplicate command creation from retry logic. Based on (agent_id + subsystem + timestamp window).';
```

**File**: `aggregator-server/internal/database/migrations/023a_command_deduplication.down.sql`
```sql
DROP INDEX IF EXISTS idx_agent_pending_subsystem;
ALTER TABLE agent_commands DROP COLUMN IF EXISTS idempotency_key;
```

### Migration 023: Client Error Logging Table
**File**: `aggregator-server/internal/database/migrations/023_client_error_logging.up.sql`
```sql
-- Client Error Logging Schema
-- Implements ETHOS #1: Errors are History, Not /dev/null

CREATE TABLE client_errors (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    agent_id UUID REFERENCES agents(id) ON DELETE SET NULL,
    subsystem VARCHAR(50) NOT NULL,
    error_type VARCHAR(50) NOT NULL,
    message TEXT NOT NULL,
    stack_trace TEXT,
    metadata JSONB,
    url TEXT NOT NULL,
    user_agent TEXT,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Indexes for efficient querying
CREATE INDEX idx_client_errors_agent_time ON client_errors(agent_id, created_at DESC);
CREATE INDEX idx_client_errors_subsystem_time ON client_errors(subsystem, created_at DESC);
CREATE INDEX idx_client_errors_error_type_time ON client_errors(error_type, created_at DESC);
CREATE INDEX idx_client_errors_created_at ON client_errors(created_at DESC);

-- Comments for documentation
COMMENT ON TABLE client_errors IS 'Frontend error logs for debugging and auditing. Implements ETHOS #1.';
COMMENT ON COLUMN client_errors.agent_id IS 'Agent active when error occurred (NULL for pre-auth errors)';
COMMENT ON COLUMN client_errors.subsystem IS 'RedFlag subsystem being used (storage, system, docker, etc.)';
COMMENT ON COLUMN client_errors.error_type IS 'Error category: javascript_error, api_error, ui_error, validation_error';
COMMENT ON COLUMN client_errors.metadata IS 'Additional context (component, API response, user actions)';

-- Add idempotency support to agent_commands
ALTER TABLE agent_commands ADD COLUMN idempotency_key UUID UNIQUE NULL;
CREATE INDEX idx_agent_commands_idempotency_key ON agent_commands(idempotency_key);
COMMENT ON COLUMN agent_commands.idempotency_key IS 'Prevents duplicate command creation from retry logic';
```

**File**: `aggregator-server/internal/database/migrations/023_client_error_logging.down.sql`
```sql
DROP TABLE IF EXISTS client_errors;
ALTER TABLE agent_commands DROP COLUMN IF EXISTS idempotency_key;
```

### Time Required: 5 minutes

---

## Phase 3: Backend Error Handler

### Files to Create

#### 3.1 Error Handler
**File**: `aggregator-server/internal/api/handlers/client_errors.go`
```go
package handlers

import (
    "database/sql"
    "fmt"
    "log"
    "net/http"
    "time"

    "github.com/gin-gonic/gin"
    "github.com/google/uuid"
    "github.com/jmoiron/sqlx"
)

// ClientErrorHandler handles frontend error logging per ETHOS #1
type ClientErrorHandler struct {
    db *sqlx.DB
}

// NewClientErrorHandler creates a new error handler
func NewClientErrorHandler(db *sqlx.DB) *ClientErrorHandler {
    return &ClientErrorHandler{db: db}
}

// LogErrorRequest represents a client error log entry
type LogErrorRequest struct {
    Subsystem   string                 `json:"subsystem" binding:"required"`
    ErrorType   string                 `json:"error_type" binding:"required,oneof=javascript_error api_error ui_error validation_error"`
    Message     string                 `json:"message" binding:"required,max=10000"`
    StackTrace  string                 `json:"stack_trace,omitempty"`
    Metadata    map[string]interface{} `json:"metadata,omitempty"`
    URL         string                 `json:"url" binding:"required"`
}

// LogError processes and stores frontend errors
func (h *ClientErrorHandler) LogError(c *gin.Context) {
    var req LogErrorRequest
    if err := c.ShouldBindJSON(&req); err != nil {
        log.Printf("[ERROR] [server] [client_error] validation_failed error=\"%v\"", err)
        c.JSON(http.StatusBadRequest, gin.H{"error": "invalid request data"})
        return
    }

    // Extract agent ID from auth middleware if available
    var agentID interface{}
    if agentIDValue, exists := c.Get("agentID"); exists {
        if id, ok := agentIDValue.(uuid.UUID); ok {
            agentID = id
        }
    }

    // Log to console with HISTORY prefix
    log.Printf("[ERROR] [server] [client] [%s] agent_id=%v subsystem=%s message=\"%s\"",
        req.ErrorType, agentID, req.Subsystem, truncate(req.Message, 200))
    log.Printf("[HISTORY] [server] [client_error] agent_id=%v subsystem=%s type=%s url=\"%s\" message=\"%s\" timestamp=%s",
        agentID, req.Subsystem, req.ErrorType, req.URL, req.Message, time.Now().Format(time.RFC3339))

    // Store in database with retry logic
    if err := h.storeError(agentID, req); err != nil {
        log.Printf("[ERROR] [server] [client_error] store_failed error=\"%v\"", err)
        c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to store error"})
        return
    }

    c.JSON(http.StatusOK, gin.H{"logged": true})
}

// storeError persists error to database with retry
func (h *ClientErrorHandler) storeError(agentID interface{}, req LogErrorRequest) error {
    const maxRetries = 3
    var lastErr error

    for attempt := 1; attempt <= maxRetries; attempt++ {
        query := `INSERT INTO client_errors (agent_id, subsystem, error_type, message, stack_trace, metadata, url, user_agent)
                  VALUES (:agent_id, :subsystem, :error_type, :message, :stack_trace, :metadata, :url, :user_agent)`

        _, err := h.db.NamedExec(query, map[string]interface{}{
            "agent_id":   agentID,
            "subsystem":  req.Subsystem,
            "error_type": req.ErrorType,
            "message":    req.Message,
            "stack_trace": req.StackTrace,
            "metadata":   req.Metadata,
            "url":        req.URL,
            "user_agent": c.GetHeader("User-Agent"),
        })

        if err == nil {
            return nil
        }

        lastErr = err
        if attempt < maxRetries {
            time.Sleep(time.Duration(attempt) * time.Second)
            continue
        }
    }

    return fmt.Errorf("failed after %d attempts: %w", maxRetries, lastErr)
}

func truncate(s string, maxLen int) string {
    if len(s) <= maxLen {
        return s
    }
    return s[:maxLen] + "..."
}

func hash(s string) string {
    // Simple hash for message deduplication detection
    h := sha256.Sum256([]byte(s))
    return fmt.Sprintf("%x", h)[:16]
}
```

#### 3.2 Query Client Errors
**File**: `aggregator-server/internal/database/queries/client_errors.sql`
```sql
-- name: GetClientErrorsByAgent :many
SELECT * FROM client_errors
WHERE agent_id = $1
ORDER BY created_at DESC
LIMIT $2;

-- name: GetClientErrorsBySubsystem :many
SELECT * FROM client_errors
WHERE subsystem = $1
ORDER BY created_at DESC
LIMIT $2;

-- name: GetClientErrorStats :one
SELECT
    subsystem,
    error_type,
    COUNT(*) as count,
    MIN(created_at) as first_occurrence,
    MAX(created_at) as last_occurrence
FROM client_errors
WHERE created_at > NOW() - INTERVAL '24 hours'
GROUP BY subsystem, error_type
ORDER BY count DESC;
```

#### 3.3 Update Router
**File**: `aggregator-server/internal/api/router.go`
```go
// Add to router setup function
func SetupRouter(db *sqlx.DB, cfg *config.Config) *gin.Engine {
    // ... existing setup ...

    // Error logging endpoint (authenticated)
    errorHandler := handlers.NewClientErrorHandler(db)
    apiV1.POST("/logs/client-error",
        middleware.AuthMiddleware(),
        errorHandler.LogError,
    )

    // Admin endpoint for viewing errors
    apiV1.GET("/logs/client-errors",
        middleware.AuthMiddleware(),
        middleware.AdminMiddleware(),
        errorHandler.GetErrors,
    )

    // ... rest of setup ...
}
```

### Time Required: 20 minutes

---

## Phase 4: Frontend Error Logger

### Files to Create

#### 4.1 Client Error Logger
**File**: `aggregator-web/src/lib/client-error-logger.ts`
```typescript
import { api, ApiError } from './api';

export interface ClientErrorLog {
  subsystem: string;
  error_type: 'javascript_error' | 'api_error' | 'ui_error' | 'validation_error';
  message: string;
  stack_trace?: string;
  metadata?: Record<string, any>;
  url: string;
  timestamp: string;
}

/**
 * ClientErrorLogger provides reliable frontend error logging with retry logic
 * Implements ETHOS #3: Assume Failure; Build for Resilience
 */
export class ClientErrorLogger {
  private maxRetries = 3;
  private baseDelayMs = 1000;
  private localStorageKey = 'redflag-error-queue';
  private offlineBuffer: ClientErrorLog[] = [];
  private isOnline = navigator.onLine;

  constructor() {
    // Listen for online/offline events
    window.addEventListener('online', () => this.flushOfflineBuffer());
    window.addEventListener('offline', () => { this.isOnline = false; });
  }

  /**
   * Log an error with automatic retry and offline queuing
   */
  async logError(errorData: Omit<ClientErrorLog, 'url' | 'timestamp'>): Promise<void> {
    const fullError: ClientErrorLog = {
      ...errorData,
      url: window.location.href,
      timestamp: new Date().toISOString(),
    };

    // Try to send immediately
    try {
      await this.sendWithRetry(fullError);
      return;
    } catch (error) {
      // If failed after retries, queue for later
      this.queueForRetry(fullError);
    }
  }

  /**
   * Send error to backend with exponential backoff retry
   */
  private async sendWithRetry(error: ClientErrorLog): Promise<void> {
    for (let attempt = 1; attempt <= this.maxRetries; attempt++) {
      try {
        await api.post('/logs/client-error', error, {
          headers: { 'X-Error-Logger-Request': 'true' },
        });

        // Success, remove from queue if it was there
        this.removeFromQueue(error);
        return;
      } catch (error) {
        if (attempt === this.maxRetries) {
          throw error; // Rethrow after final attempt
        }

        // Exponential backoff
        await this.sleep(this.baseDelayMs * Math.pow(2, attempt - 1));
      }
    }
  }

  /**
   * Queue error for retry when network is available
   */
  private queueForRetry(error: ClientErrorLog): void {
    try {
      const queue = this.getQueue();
      queue.push({
        ...error,
        retryCount: (error as any).retryCount || 0,
        queuedAt: new Date().toISOString(),
      });

      // Save to localStorage for persistence
      localStorage.setItem(this.localStorageKey, JSON.stringify(queue));

      // Also keep in memory buffer
      this.offlineBuffer.push(error);
    } catch (storageError) {
      // localStorage might be full or unavailable
      console.warn('Failed to queue error for retry:', storageError);
    }
  }

  /**
   * Flush offline buffer when coming back online
   */
  private async flushOfflineBuffer(): Promise<void> {
    if (!this.isOnline) return;

    const queue = this.getQueue();
    if (queue.length === 0) return;

    const failed: typeof queue = [];

    for (const queuedError of queue) {
      try {
        await this.sendWithRetry(queuedError);
      } catch (error) {
        failed.push(queuedError);
      }
    }

    // Update queue with remaining failed items
    if (failed.length < queue.length) {
      localStorage.setItem(this.localStorageKey, JSON.stringify(failed));
    }
  }

  /**
   * Get current error queue from localStorage
   */
  private getQueue(): any[] {
    try {
      const stored = localStorage.getItem(this.localStorageKey);
      return stored ? JSON.parse(stored) : [];
    } catch {
      return [];
    }
  }

  /**
   * Remove successfully sent error from queue
   */
  private removeFromQueue(sentError: ClientErrorLog): void {
    try {
      const queue = this.getQueue();
      const filtered = queue.filter(queued =>
        queued.timestamp !== sentError.timestamp ||
        queued.message !== sentError.message
      );

      if (filtered.length < queue.length) {
        localStorage.setItem(this.localStorageKey, JSON.stringify(filtered));
      }
    } catch {
      // Best effort cleanup
    }
  }

  /**
   * Capture unhandled JavaScript errors
   */
  captureUnhandledErrors(): void {
    // Global error handler
    window.addEventListener('error', (event) => {
      this.logError({
        subsystem: 'global',
        error_type: 'javascript_error',
        message: event.message,
        stack_trace: event.error?.stack,
        metadata: {
          filename: event.filename,
          lineno: event.lineno,
          colno: event.colno,
        },
      }).catch(() => {
        // Silently ignore logging failures
      });
    });

    // Unhandled promise rejections
    window.addEventListener('unhandledrejection', (event) => {
      this.logError({
        subsystem: 'global',
        error_type: 'javascript_error',
        message: event.reason?.message || String(event.reason),
        stack_trace: event.reason?.stack,
      }).catch(() => {});
    });
  }

  private sleep(ms: number): Promise<void> {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

// Singleton instance
export const clientErrorLogger = new ClientErrorLogger();

// Auto-capture unhandled errors
if (typeof window !== 'undefined') {
  clientErrorLogger.captureUnhandledErrors();
}
```

#### 4.2 Toast Wrapper with Logging
**File**: `aggregator-web/src/lib/toast-with-logging.ts`
```typescript
import toast, { ToastOptions } from 'react-hot-toast';
import { clientErrorLogger } from './client-error-logger';
import { useLocation } from 'react-router-dom';

/**
 * Extract subsystem from current route
 */
function getCurrentSubsystem(): string {
  if (typeof window === 'undefined') return 'unknown';

  const path = window.location.pathname;

  // Map routes to subsystems
  if (path.includes('/storage')) return 'storage';
  if (path.includes('/system')) return 'system';
  if (path.includes('/docker')) return 'docker';
  if (path.includes('/updates')) return 'updates';
  if (path.includes('/agent/')) return 'agent';

  return 'unknown';
}

/**
 * Wrap toast.error to automatically log errors to backend
 * Implements ETHOS #1: Errors are History
 */
export const toastWithLogging = {
  error: (message: string, options?: ToastOptions & { subsystem?: string }) => {
    const subsystem = options?.subsystem || getCurrentSubsystem();

    // Log to backend asynchronously - don't block UI
    clientErrorLogger.logError({
      subsystem,
      error_type: 'ui_error',
      message: message.substring(0, 5000), // Prevent excessively long messages
      metadata: {
        component: options?.id,
        duration: options?.duration,
        position: options?.position,
        timestamp: new Date().toISOString(),
      },
    }).catch(() => {
      // Silently ignore logging failures - don't crash the UI
    });

    // Show toast to user
    return toast.error(message, options);
  },

  // Passthrough methods
  success: toast.success,
  info: toast.info,
  warning: toast.warning,
  loading: toast.loading,
  dismiss: toast.dismiss,
  remove: toast.remove,
  promise: toast.promise,
};

/**
 * React hook for toast with automatic subsystem detection
 */
export function useToastWithLogging() {
  const location = useLocation();

  return {
    error: (message: string, options?: ToastOptions) => {
      return toastWithLogging.error(message, {
        ...options,
        subsystem: getSubsystemFromPath(location.pathname),
      });
    },
    success: toast.success,
    info: toast.info,
    warning: toast.warning,
    loading: toast.loading,
    dismiss: toast.dismiss,
  };
}

function getSubsystemFromPath(pathname: string): string {
  const matches = pathname.match(/\/(storage|system|docker|updates|agent)/);
  return matches ? matches[1] : 'unknown';
}
```

#### 4.3 API Integration
**Update**: `aggregator-web/src/lib/api.ts`
```typescript
// Add error logging to axios interceptor
api.interceptors.response.use(
  (response) => response,
  async (error) => {
    // Don't log errors from the error logger itself
    if (error.config?.headers?.['X-Error-Logger-Request']) {
      return Promise.reject(error);
    }

    // Extract subsystem from URL
    const subsystem = extractSubsystem(error.config?.url);

    // Log API errors
    clientErrorLogger.logError({
      subsystem,
      error_type: 'api_error',
      message: error.message,
      metadata: {
        status_code: error.response?.status,
        endpoint: error.config?.url,
        method: error.config?.method,
        response_data: error.response?.data,
      },
    }).catch(() => {
      // Don't let logging errors hide the original error
    });

    return Promise.reject(error);
  }
);

function extractSubsystem(url: string = ''): string {
  const matches = url.match(/\/(storage|system|docker|updates|agent)/);
  return matches ? matches[1] : 'unknown';
}
```

### Time Required: 20 minutes

---

## Phase 5: Toast Integration

### Update Existing Error Calls

**Pattern**: Update error toast calls to use new logger

**Before**:
```typescript
import toast from 'react-hot-toast';

toast.error(`Failed to trigger scan: ${error.message}`);
```

**After**:
```typescript
import { toastWithLogging } from '@/lib/toast-with-logging';

toastWithLogging.error(`Failed to trigger scan: ${error.message}`, {
  subsystem: 'storage', // Specify subsystem
  id: 'trigger-scan-error', // Optional component ID
});
```

### Priority Files to Update

#### 5.1 React State Management for Scan Buttons
**File**: Create `aggregator-web/src/hooks/useScanState.ts`
```typescript
import { useState, useCallback } from 'react';
import { api } from '@/lib/api';
import { toastWithLogging } from '@/lib/toast-with-logging';

interface ScanState {
  isScanning: boolean;
  commandId?: string;
  error?: string;
}

/**
 * Hook for managing scan button state and preventing duplicate scans
 */
export function useScanState(agentId: string, subsystem: string) {
  const [state, setState] = useState<ScanState>({
    isScanning: false,
  });

  const triggerScan = useCallback(async () => {
    if (state.isScanning) {
      toastWithLogging.info('Scan already in progress', { subsystem });
      return;
    }

    setState({ isScanning: true, commandId: undefined, error: undefined });

    try {
      const result = await api.post(`/agents/${agentId}/subsystems/${subsystem}/trigger`);

      setState(prev => ({
        ...prev,
        commandId: result.data.command_id,
      }));

      // Poll for completion or wait for subscription update
      await waitForScanComplete(agentId, result.data.command_id);

      setState({ isScanning: false, commandId: result.data.command_id });

      toastWithLogging.success(`${subsystem} scan completed`, { subsystem });
    } catch (error: any) {
      const isAlreadyRunning = error.response?.status === 409;

      if (isAlreadyRunning) {
        const existingCommandId = error.response?.data?.command_id;
        setState({
          isScanning: false,
          commandId: existingCommandId,
          error: 'Scan already in progress',
        });

        toastWithLogging.info(`Scan already running (command: ${existingCommandId})`, { subsystem });
      } else {
        const errorMessage = error.response?.data?.error || error.message;
        setState({
          isScanning: false,
          error: errorMessage,
        });

        toastWithLogging.error(`Failed to trigger scan: ${errorMessage}`, { subsystem });
      }
    }
  }, [agentId, subsystem, state.isScanning]);

  const reset = useCallback(() => {
    setState({ isScanning: false, commandId: undefined, error: undefined });
  }, []);

  return {
    isScanning: state.isScanning,
    commandId: state.commandId,
    error: state.error,
    triggerScan,
    reset,
  };
}

/**
 * Wait for scan to complete by polling command status
 */
async function waitForScanComplete(agentId: string, commandId: string): Promise<void> {
  const maxWaitMs = 300000; // 5 minutes max
  const startTime = Date.now();
  const pollInterval = 2000; // Poll every 2 seconds

  return new Promise((resolve, reject) => {
    const interval = setInterval(async () => {
      try {
        const result = await api.get(`/agents/${agentId}/commands/${commandId}`);

        if (result.data.status === 'completed' || result.data.status === 'failed') {
          clearInterval(interval);
          resolve();
        }
      } catch (error) {
        clearInterval(interval);
        reject(error);
      }

      if (Date.now() - startTime > maxWaitMs) {
        clearInterval(interval);
        reject(new Error('Scan timeout'));
      }
    }, pollInterval);
  });
}
```

**Usage Example in Component**:
```typescript
import { useScanState } from '@/hooks/useScanState';

function ScanButton({ agentId, subsystem }: { agentId: string; subsystem: string }) {
  const { isScanning, triggerScan } = useScanState(agentId, subsystem);

  return (
    <button
      onClick={triggerScan}
      disabled={isScanning}
      className={isScanning ? 'btn-disabled' : 'btn-primary'}
    >
      {isScanning ? (
        <>
          <Spinner className="animate-spin" />
          Scanning...
        </>
      ) : (
        `Scan ${subsystem}`
      )}
    </button>
  );
}
```

#### 5.2 Update Existing Error Calls
**Priority Files to Update**

1. **Agent Subsystem Actions** - `/src/components/AgentSubsystems.tsx`
2. **Command Retry Logic** - `/src/hooks/useCommands.ts`
3. **Authentication Errors** - `/src/lib/auth.ts`
4. **API Error Boundaries** - `/src/components/ErrorBoundary.tsx`

### Example Complete Integration

**File**: `aggregator-web/src/components/AgentSubsystems.tsx` (example update)
```typescript
import { toastWithLogging } from '@/lib/toast-with-logging';

const handleTrigger = async (subsystem: string) => {
  try {
    await triggerSubsystem(agentId, subsystem);
  } catch (error) {
    toastWithLogging.error(
      `Failed to trigger ${subsystem} scan: ${error.message}`,
      {
        subsystem,
        id: `trigger-${subsystem}`,
      }
    );
  }
};
```

### Time Required: 15 minutes

#### 5.3 Deduplication Testing
**Test Cases**:
```typescript
// Test 1: Rapid clicking prevention
test('clicking scan button 10 times creates only 1 command', async () => {
  const button = screen.getByText('Scan APT');

  // Click 10 times rapidly
  for (let i = 0; i < 10; i++) {
    fireEvent.click(button);
  }

  // Should only create 1 command
  expect(api.post).toHaveBeenCalledTimes(1);
  expect(api.post).toHaveBeenCalledWith('/agents/123/subsystems/apt/trigger');
});

// Test 2: Button disabled while scanning
test('button disabled during scan', async () => {
  const button = screen.getByText('Scan APT');

  fireEvent.click(button);

  // Button should be disabled immediately
  expect(button).toBeDisabled();
  expect(screen.getByText('Scanning...')).toBeInTheDocument();

  await waitFor(() => {
    expect(button).not.toBeDisabled();
  });
});

// Test 3: 409 Conflict returns existing command
mock.onPost().reply(409, {
  error: 'Scan already in progress',
  command_id: 'existing-id',
});

expect(await triggerScan()).toEqual({ command_id: 'existing-id' });
expect(toast).toHaveBeenCalledWith('Scan already running');
```

---

## Phase 6: Verification & Testing

### Manual Testing Checklist

#### 6.1 Migration Testing
- [ ] Run migration 023 successfully
- [ ] Verify `client_errors` table exists
- [ ] Verify `idempotency_key` column added to `agent_commands`
- [ ] Test on fresh database (no duplicate key errors)

#### 6.2 Command Factory Testing
- [ ] Rapid-fire scan button clicks (10+ clicks in 2 seconds)
- [ ] Verify all commands created with unique IDs
- [ ] Check no duplicate key violations in logs
- [ ] Verify commands appear in database correctly

#### 6.3 Error Logging Testing
- [ ] Trigger UI error (e.g., invalid input)
- [ ] Verify error appears in toast
- [ ] Check database - error should be stored in `client_errors`
- [ ] Trigger API error (e.g., network timeout)
- [ ] Verify exponential backoff retry works
- [ ] Disconnect network, trigger error, reconnect
- [ ] Verify error is queued and sent when back online

#### 6.4 Integration Testing
- [ ] Full user workflow: login → trigger scan → view results
- [ ] Verify all errors logged with [HISTORY] prefix
- [ ] Check logs are queryable by subsystem
- [ ] Verify error logging doesn't block UI

### Automated Test Cases

#### 6.5 Backend Tests
**File**: `aggregator-server/internal/command/factory_test.go`
```go
package command

import (
    "testing"

    "github.com/google/uuid"
    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/require"
)

func TestFactory_Create(t *testing.T) {
    factory := NewFactory()
    agentID := uuid.New()

    cmd, err := factory.Create(agentID, "scan_storage", map[string]interface{}{"path": "/"})

    require.NoError(t, err)
    assert.NotEqual(t, uuid.Nil, cmd.ID, "ID must be generated")
    assert.Equal(t, agentID, cmd.AgentID)
    assert.Equal(t, "scan_storage", cmd.CommandType)
    assert.Equal(t, "pending", cmd.Status)
    assert.Equal(t, "manual", cmd.Source)
}

func TestFactory_CreateWithIdempotency(t *testing.T) {
    factory := NewFactory()
    agentID := uuid.New()
    idempotencyKey := "test-key-123"

    // Create first command
    cmd1, err := factory.CreateWithIdempotency(agentID, "scan_system", nil, idempotencyKey)
    require.NoError(t, err)

    // Create "duplicate" command with same idempotency key
    cmd2, err := factory.CreateWithIdempotency(agentID, "scan_system", nil, idempotencyKey)
    require.NoError(t, err)

    // Should return same command
    assert.Equal(t, cmd1.ID, cmd2.ID, "Idempotency key should return same command")
}

func TestFactory_Validate(t *testing.T) {
    tests := []struct {
        name    string
        cmd     *models.AgentCommand
        wantErr bool
    }{
        {
            name: "valid command",
            cmd: &models.AgentCommand{
                ID:          uuid.New(),
                AgentID:     uuid.New(),
                CommandType: "scan_storage",
                Status:      "pending",
                Source:      "manual",
            },
            wantErr: false,
        },
        {
            name: "missing ID",
            cmd: &models.AgentCommand{
                ID:          uuid.Nil,
                AgentID:     uuid.New(),
                CommandType: "scan_storage",
                Status:      "pending",
                Source:      "manual",
            },
            wantErr: true,
        },
    }

    factory := NewFactory()

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            _, err := factory.Create(tt.cmd.AgentID, tt.cmd.CommandType, nil)
            if tt.wantErr {
                assert.Error(t, err)
            } else {
                assert.NoError(t, err)
            }
        })
    }
}
```

#### 6.6 Frontend Tests
**File**: `aggregator-web/src/lib/client-error-logger.test.ts`
```typescript
import { clientErrorLogger } from './client-error-logger';
import { api } from './api';

jest.mock('./api');

describe('ClientErrorLogger', () => {
  beforeEach(() => {
    localStorage.clear();
    jest.clearAllMocks();
  });

  test('logs error successfully on first attempt', async () => {
    (api.post as jest.Mock).mockResolvedValue({});

    await clientErrorLogger.logError({
      subsystem: 'storage',
      error_type: 'api_error',
      message: 'Test error',
    });

    expect(api.post).toHaveBeenCalledTimes(1);
    expect(api.post).toHaveBeenCalledWith(
      '/logs/client-error',
      expect.objectContaining({
        subsystem: 'storage',
        error_type: 'api_error',
        message: 'Test error',
      }),
      expect.any(Object)
    );
  });

  test('retries on failure then saves to localStorage', async () => {
    (api.post as jest.Mock)
      .mockRejectedValueOnce(new Error('Network error'))
      .mockRejectedValueOnce(new Error('Network error'))
      .mockRejectedValueOnce(new Error('Network error'));

    await clientErrorLogger.logError({
      subsystem: 'storage',
      error_type: 'api_error',
      message: 'Test error',
    });

    expect(api.post).toHaveBeenCalledTimes(3);

    // Should be saved to localStorage
    const queue = localStorage.getItem('redflag-error-queue');
    expect(queue).toBeTruthy();
    expect(JSON.parse(queue!).length).toBe(1);
  });

  test('flushes queue when coming back online', async () => {
    // Pre-populate queue
    const queuedError = {
      subsystem: 'storage',
      error_type: 'api_error',
      message: 'Queued error',
      timestamp: new Date().toISOString(),
    };
    localStorage.setItem('redflag-error-queue', JSON.stringify([queuedError]));

    (api.post as jest.Mock).mockResolvedValue({});

    // Trigger online event
    window.dispatchEvent(new Event('online'));

    // Wait for flush
    await new Promise(resolve => setTimeout(resolve, 100));

    expect(api.post).toHaveBeenCalled();
    expect(localStorage.getItem('redflag-error-queue')).toBe('[]');
  });
});
```

### Time Required: 30 minutes

---

## Implementation Checklist

### Pre-Implementation
- [ ] ✅ Migration system bug fixed (lines 103 & 116 in db.go)
- [ ] ✅ Database wiped and fresh instance ready
- [ ] ✅ Test agents available for rapid scan testing
- [ ] ✅ Development environment ready (all 3 components)

### Phase 1: Command Factory (25 min)
- [ ] Create `aggregator-server/internal/command/factory.go`
- [ ] Create `aggregator-server/internal/command/validator.go`
- [ ] Update `aggregator-server/internal/models/command.go`
- [ ] Update `aggregator-server/internal/api/handlers/subsystems.go`
- [ ] Test: Verify rapid scan clicks work

### Phase 2: Database Schema (5 min)
- [ ] Create migration `023_client_error_logging.up.sql`
- [ ] Create migration `023_client_error_logging.down.sql`
- [ ] Run migration and verify table creation
- [ ] Verify indexes created

### Phase 3: Backend Handler (20 min)
- [ ] Create `aggregator-server/internal/api/handlers/client_errors.go`
- [ ] Create `aggregator-server/internal/database/queries/client_errors.sql`
- [ ] Update `aggregator-server/internal/api/router.go`
- [ ] Test API endpoint with curl

### Phase 4: Frontend Logger (20 min)
- [ ] Create `aggregator-web/src/lib/client-error-logger.ts`
- [ ] Create `aggregator-web/src/lib/toast-with-logging.ts`
- [ ] Update `aggregator-web/src/lib/api.ts`
- [ ] Test offline/online queue behavior

### Phase 5: Toast Integration (15 min)
- [ ] Create `useScanState` hook for button state management
- [ ] Update scan buttons to use `useScanState`
- [ ] Test button disabling during scan
- [ ] Update 3-5 critical error locations to use `toastWithLogging`
- [ ] Verify errors appear in both toast and database
- [ ] Test in multiple subsystems
- [ ] **Test deduplication**: Rapid clicking creates only 1 command
- [ ] **Test 409 response**: Returns existing command when scan running

### Phase 6: Verification (30 min)
- [ ] Run all test cases
- [ ] Verify ETHOS compliance checklist
- [ ] Test rapid scan clicking (no duplicates)
- [ ] Test error persistence across page reloads
- [ ] Verify [HISTORY] logs in server output

### Documentation
- [ ] Update session documentation
- [ ] Create testing summary
- [ ] Document any issues encountered
- [ ] Update architecture documentation

---

## Risk Mitigation

### Risk 1: Migration Failures
**Probability**: Medium | **Impact**: High | **Severity**: 🔴 Critical

**Mitigation**:
- Fix migration runner bug FIRST (before this implementation)
- Test migration on fresh database
- Keep database backups
- Have rollback script ready

**Contingency**: If migration fails, manually apply SQL and continue

---

### Risk 2: Performance Impact
**Probability**: Low | **Impact**: Medium | **Severity**: 🟡 Medium

**Mitigation**:
- Async error logging (non-blocking)
- LocalStorage queue with size limit (max 50 errors)
- Database indexes for fast queries
- Batch insert if needed in future

**Contingency**: If performance degrades, add sampling (log 1 in 10 errors)

---

### Risk 3: Infinite Error Loops
**Probability**: Low | **Impact**: High | **Severity**: 🟡 Medium

**Mitigation**:
- `X-Error-Logger-Request` header prevents recursive logging
- Max retry count (3 attempts)
- Exponential backoff prevents thundering herd

**Contingency**: If loop detected, check for missing header and fix

---

### Risk 4: Privacy Concerns
**Probability**: Low | **Impact**: High | **Severity**: 🟡 Medium

**Mitigation**:
- No PII in error messages (validate during logging)
- User agent stored but can be anonymized
- Stack traces only from our code (not user code)

**Contingency**: Add privacy filter to scrub sensitive data

---

## Post-Implementation Review

### Success Criteria
- [ ] No duplicate key violations during rapid clicking
- [ ] All errors persist in database
- [ ] Error logs queryable and useful for debugging
- [ ] No performance degradation observed
- [ ] System handles offline/online transitions gracefully
- [ ] All tests pass

### Performance Benchmarks
- Command creation: < 10ms per command
- Error logging: < 50ms per error (async)
- Database queries: < 100ms for common queries
- Bundle size increase: < 5KB gzipped

### Known Limitations
- Error logs don't include full request payloads (privacy)
- localStorage queue limited by browser storage (~5MB)
- Retries happen in foreground (could be moved to background)

### Future Enhancements (Post v0.1.27)
- Error aggregation and deduplication
- Error rate alerting
- Error analytics dashboard
- Automatic error categorization
- Integration with notification system

---

## Rollback Plan

If critical issues arise:

1. **Revert Code Changes**:
   ```bash
   git revert HEAD~6..HEAD  # Revert last 6 commits
   ```

2. **Rollback Database**:
   ```bash
   cd aggregator-server
   # Run down migration
   go run cmd/migrate/main.go -migrate-down 1
   ```

3. **Rebuild and Deploy**:
   ```bash
   docker-compose build --no-cache
   docker-compose up -d
   ```

---

## Additional Notes

**Team Coordination**:
- Coordinate with frontend team if they're working on error handling
- Notify QA about new error logging features for testing
- Update documentation team about database schema changes

**Monitoring**:
- Monitor `client_errors` table growth
- Set up alerts for error rate spikes
- Track failed error logging attempts

**Documentation Updates**:
- Update API documentation for `/logs/client-error` endpoint
- Document error log query patterns for support team
- Add troubleshooting guide for common errors

---

**Plan Created By**: Ani (AI Assistant)
**Reviewed By**: Casey Tunturi
**Status**: 🟢 APPROVED FOR IMPLEMENTATION
**Next Step**: Begin Phase 1 (Command Factory)

**Estimated Timeline**:
- Start: Immediately
- Complete: ~2-3 hours
- Test: 30 minutes
- Deploy: After verification

This is a complete, production-ready implementation plan. Each phase builds on the previous one, with full error handling, testing, and rollback procedures included.

Let's build this right. 💪