# Clean Architecture: Command ID & Frontend Error Logging

**Date**: 2025-12-19
**Status**: CLEAN ARCHITECTURE DESIGN (ETHOS Compliant)

---

## Problem Statement

RedFlag has two critical issues violating ETHOS principles:

1. **Command ID Generation Failure**: Server fails to generate unique IDs for commands, causing `pq: duplicate key value violates unique constraint "agent_commands_pkey"` when users trigger multiple scans rapidly

2. **Frontend Errors Lost**: UI failures show toasts but are never persisted, violating ETHOS #1: "Errors are History, Not /dev/null"

---

## ETHOS Compliance Requirements

**ETHOS #1**: All errors must be captured, logged with context, stored in history table - NEVER to /dev/null

**ETHOS #2**: No unauthenticated endpoints - all routes protected by established security stack

**ETHOS #3**: Assume failure - implement retry logic with exponential backoff for network operations

**ETHOS #4**: Idempotency - system must handle duplicate operations gracefully

**ETHOS #5**: No marketing fluff - clear, honest naming using technical terms

---

## Clean Architecture Design

### Phase 1: Command ID Generation (Server-Side)

#### Problem
Commands are created without IDs, causing PostgreSQL to receive zero UUIDs (00000000-0000-0000-0000-000000000000), resulting in primary key violations on subsequent inserts.

#### Solution: Command Factory Pattern

```go
// File: aggregator-server/internal/command/factory.go

package command

import (
    "errors"
    "fmt"
    
    "github.com/Fimeg/RedFlag/aggregator-server/internal/models"
    "github.com/google/uuid"
)

// Factory creates validated AgentCommand instances
type Factory struct{}

// NewFactory creates a new command factory
func NewFactory() *Factory {
    return &Factory{}
}

// Create generates a new validated AgentCommand
func (f *Factory) Create(agentID uuid.UUID, commandType string, params map[string]interface{}) (*models.AgentCommand, error) {
    cmd := &models.AgentCommand{
        ID:          uuid.New(), // Generation happens immediately and explicitly
        AgentID:     agentID,
        CommandType: commandType,
        Status:      "pending",
        Source:      "manual",
        Params:      params,
    }
    
    if err := cmd.Validate(); err != nil {
        return nil, fmt.Errorf("command validation failed: %w", err)
    }
    
    return cmd, nil
}
```

Add validation to AgentCommand model:
```go
// File: aggregator-server/internal/models/command.go

// Validate checks if the command is valid
func (c *AgentCommand) Validate() error {
    if c.ID == uuid.Nil {
        return errors.New("command ID cannot be zero UUID")
    }
    if c.AgentID == uuid.Nil {
        return errors.New("agent ID required")
    }
    if c.CommandType == "" {
        return errors.New("command type required")
    }
    if c.Status == "" {
        return errors.New("status required")
    }
    if c.Source != "manual" && c.Source != "system" {
        return errors.New("source must be 'manual' or 'system'")
    }
    
    return nil
}
```

**Rationale**: Factory pattern ensures IDs are always generated at creation time, making it impossible to create invalid commands. Fail-fast validation catches issues immediately.

**Impact**: Fixes the immediate duplicate key error and prevents similar bugs in all future command creation.

---

### Phase 2: Frontend Error Logging (UI to Server)

#### Problem
Frontend shows errors via toast notifications but never persists them. When users report "the button didn't work," we have no record of what failed, when, or why.

**ETHOS #1 Violation**: Errors that exist only in browser memory are equivalent to /dev/null

#### Solution: Client Error Logging System

##### Step 2.1: Database Schema

```sql
-- File: aggregator-server/internal/database/migrations/023_client_error_logging.up.sql
-- Purpose: Store frontend errors for debugging and auditing

CREATE TABLE client_errors (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    agent_id UUID REFERENCES agents(id) ON DELETE SET NULL,
    subsystem VARCHAR(50) NOT NULL,
    error_type VARCHAR(50) NOT NULL, -- 'javascript_error', 'api_error', 'ui_error', 'validation_error'
    message TEXT NOT NULL,
    stack_trace TEXT,
    metadata JSONB,
    url TEXT NOT NULL,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Indexes for common query patterns
CREATE INDEX idx_client_errors_agent_time ON client_errors(agent_id, created_at DESC);
CREATE INDEX idx_client_errors_subsystem_time ON client_errors(subsystem, created_at DESC);
CREATE INDEX idx_client_errors_type_time ON client_errors(error_type, created_at DESC);

-- Comments for documentation
COMMENT ON TABLE client_errors IS 'Frontend error logs for debugging and auditing';
COMMENT ON COLUMN client_errors.agent_id IS 'Agent that was active when error occurred (NULL for pre-auth errors)';
COMMENT ON COLUMN client_errors.subsystem IS 'Which RedFlag subsystem was being used';
COMMENT ON COLUMN client_errors.error_type IS 'Category of error for filtering';
COMMENT ON COLUMN client_errors.metadata IS 'Additional context (component name, API response, user actions)';
```

**Rationale**: Proper schema with indexes allows efficient querying. References agents table to correlate errors with specific agents. Stores rich context for debugging.

---

##### Step 2.2: Backend Handler

```go
// File: aggregator-server/internal/api/handlers/client_errors.go

package handlers

import (
    "database/sql"
    "fmt"
    "log"
    "net/http"
    "time"
    
    "github.com/gin-gonic/gin"
    "github.com/jmoiron/sqlx"
)

// ClientErrorHandler handles frontend error logging
type ClientErrorHandler struct {
    db           *sqlx.DB
}

// NewClientErrorHandler creates a new error handler
func NewClientErrorHandler(db *sqlx.DB) *ClientErrorHandler {
    return &ClientErrorHandler{db: db}
}

// LogError processes and stores frontend errors
func (h *ClientErrorHandler) LogError(c *gin.Context) {
    // Extract agent ID from auth middleware if available
    var agentID interface{}
    if agentIDValue, exists := c.Get("agentID"); exists {
        agentID = agentIDValue
    }
    
    var req struct {
        Subsystem   string                 `json:"subsystem" binding:"required"`
        ErrorType   string                 `json:"error_type" binding:"required,oneof=javascript_error api_error ui_error validation_error"`
        Message     string                 `json:"message" binding:"required"`
        StackTrace  string                 `json:"stack_trace,omitempty"`
        Metadata    map[string]interface{} `json:"metadata,omitempty"`
        URL         string                 `json:"url" binding:"required"`
    }
    
    if err := c.ShouldBindJSON(&req); err != nil {
        log.Printf("[ERROR] [server] [client_error] validation_failed error=\"%v\"", err)
        c.JSON(http.StatusBadRequest, gin.H{"error": "invalid request data"})
        return
    }
    
    // Log to console with HISTORY prefix for unified logging
    log.Printf("[ERROR] [server] [client] [%s] agent_id=%v subsystem=%s message=\"%s\"", 
        req.ErrorType, agentID, req.Subsystem, req.Message)
    log.Printf("[HISTORY] [server] [client_error] agent_id=%v subsystem=%s type=%s url=\"%s\" message=\"%s\" timestamp=%s",
        agentID, req.Subsystem, req.ErrorType, req.URL, req.Message, time.Now().Format(time.RFC3339))
    
    // Attempt to store in database with retry logic
    const maxRetries = 3
    var lastErr error
    
    for attempt := 1; attempt <= maxRetries; attempt++ {
        query := `INSERT INTO client_errors (agent_id, subsystem, error_type, message, stack_trace, metadata, url) 
                  VALUES (:agent_id, :subsystem, :error_type, :message, :stack_trace, :metadata, :url)`
        
        _, err := h.db.NamedExec(query, map[string]interface{}{
            "agent_id":   agentID,
            "subsystem":  req.Subsystem,
            "error_type": req.ErrorType,
            "message":    req.Message,
            "stack_trace": req.StackTrace,
            "metadata":   req.Metadata,
            "url":        req.URL,
        })
        
        if err == nil {
            c.JSON(http.StatusOK, gin.H{"logged": true})
            return
        }
        
        lastErr = err
        if attempt < maxRetries {
            time.Sleep(time.Duration(attempt) * time.Second)
            continue
        }
    }
    
    log.Printf("[ERROR] [server] [client_error] persistent_failure error=\"%v\"", lastErr)
    c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to persist error after retries"})
}
```

**Rationale**: 
- Validates input before processing
- Logs with [HISTORY] prefix for unified log aggregation
- Implements retry logic per ETHOS #3 (Assume Failure)
- Returns appropriate HTTP status codes
- Handles database connection failures gracefully

---

##### Step 2.3: Frontend Error Logger

```typescript
// File: aggregator-web/src/lib/client-error-logger.ts

import { api, ApiError } from './api';

export interface ClientErrorLog {
  subsystem: string;
  error_type: 'javascript_error' | 'api_error' | 'ui_error' | 'validation_error';
  message: string;
  stack_trace?: string;
  metadata?: Record<string, any>;
  url: string;
}

/**
 * ClientErrorLogger provides reliable frontend error logging to backend
 * Implements retry logic per ETHOS #3 (Assume Failure)
 */
export class ClientErrorLogger {
  private maxRetries = 3;
  private baseDelayMs = 1000;
  private localStorageKey = 'redflag-failed-error-logs';

  /**
   * Log an error to the backend with automatic retry
   */
  async logError(errorData: Omit<ClientErrorLog, 'url'>): Promise<void> {
    const fullError: ClientErrorLog = {
      ...errorData,
      url: window.location.href,
    };

    for (let attempt = 1; attempt <= this.maxRetries; attempt++) {
      try {
        await api.post('/logs/client-error', fullError, {
          // Add header to prevent infinite loop if error logger fails
          headers: { 'X-Error-Logger-Request': 'true' },
        });
        return; // Success
      } catch (error) {
        if (attempt === this.maxRetries) {
          // Save to localStorage for later retry
          this.saveFailedLog({ ...fullError, attempt });
        } else {
          // Exponential backoff
          await this.sleep(this.baseDelayMs * attempt);
        }
      }
    }
  }

  /**
   * Attempt to resend failed error logs from localStorage
   */
  async retryFailedLogs(): Promise<void> {
    const failedLogs = this.getFailedLogs();
    if (failedLogs.length === 0) return;

    const remaining: any[] = [];

    for (const log of failedLogs) {
      try {
        await this.logError(log);
      } catch {
        remaining.push(log);
      }
    }

    if (remaining.length < failedLogs.length) {
      // Some succeeded, update localStorage
      localStorage.setItem(this.localStorageKey, JSON.stringify(remaining));
    }
  }

  private sleep(ms: number): Promise<void> {
    return new Promise(resolve => setTimeout(resolve, ms));
  }

  private saveFailedLog(log: any): void {
    try {
      const existing = this.getFailedLogs();
      existing.push(log);
      localStorage.setItem(this.localStorageKey, JSON.stringify(existing));
    } catch {
      // localStorage might be full or unavailable
    }
  }

  private getFailedLogs(): any[] {
    try {
      const stored = localStorage.getItem(this.localStorageKey);
      return stored ? JSON.parse(stored) : [];
    } catch {
      return [];
    }
  }
}

// Singleton instance
export const clientErrorLogger = new ClientErrorLogger();

// Auto-retry failed logs on app load
if (typeof window !== 'undefined') {
  window.addEventListener('load', () => {
    clientErrorLogger.retryFailedLogs().catch(() => {});
  });
}
```

**Rationale**:
- Implements ETHOS #3 (Assume Failure) with exponential backoff
- Saves failed logs to localStorage for retry when network recovers
- Auto-retry on app load captures errors from previous sessions
- No infinite loops (X-Error-Logger-Request header)

---

##### Step 2.4: Toast Integration

```typescript
// File: aggregator-web/src/lib/toast-with-logging.ts

import toast, { ToastOptions } from 'react-hot-toast';
import { clientErrorLogger } from './client-error-logger';

// Store reference to original methods
const toastError = toast.error;
const toastSuccess = toast.success;

/**
 * Wraps toast.error to automatically log errors to backend
 * Implements ETHOS #1 (Errors are History)
 */
export const toastWithLogging = {
  error: (message: string, subsystem: string, options?: ToastOptions) => {
    // Log to backend asynchronously - don't block UI
    clientErrorLogger.logError({
      subsystem,
      error_type: 'ui_error',
      message: message.substring(0, 1000), // Prevent excessively long messages
      metadata: {
        timestamp: new Date().toISOString(),
        user_agent: navigator.userAgent,
      },
    }).catch(() => {
      // Silently ignore logging failures - don't crash the UI
    });

    // Show toast to user
    return toastError(message, options);
  },

  success: toastSuccess,
  info: toast.info,
  warning: toast.warning,
  loading: toast.loading,
  dismiss: toast.dismiss,
};
```

**Rationale**: Transparent wrapper that maintains toast API while adding error logging. User experience unchanged but errors now persist to history table.

---

## Implementation Evaluation: Retry Logic Necessity

**Question**: Does every client error log need exponential backoff retry?

**Analysis**:

### Errors That SHOULD Have Retry:
1. **API Errors**: Network failures, server 502s, connection timeouts
   - High value: These indicate real problems
   - Retry needed: Network glitches common

2. **Critical UI Failures**: Command creation failures, permission errors  
   - High value: Affect user workflow
   - Retry needed: Server might be temporarily overloaded

### Errors That Could Skip Retry:
1. **Validation Errors**: User entered invalid data
   - Low value: Expected behavior, not a system issue
   - No retry: Will immediately fail again

2. **Browser Compatibility Issues**: Old browser, missing features
   - Low value: Persistent problem until user upgrades
   - No retry: Won't fix itself

### Recommendation: **Use Retry for API and Critical Errors Only**

```typescript
// Simplified version for validation errors (no retry)
export const logValidationError = async (subsystem: string, message: string) => {
  try {
    await api.post('/logs/client-error', { 
      subsystem, 
      error_type: 'validation_error',
      message,
    });
  } catch {
    // Best effort only - validation errors aren't critical
  }
};

// Full retry version for API errors
export const logApiError = async (subsystem: string, message: string) => {
  clientErrorLogger.logError({
    subsystem,
    error_type: 'api_error',
    message,
  });
};
```

**Decision**: Keep retry logic in the general logger (most errors are API/critical), create specific no-retry helpers for validation cases.

---

## Testing Strategy

### Test Command ID Generation
```go
func TestCommandFactory_Create(t *testing.T) {
    factory := command.NewFactory()
    agentID := uuid.New()
    
    cmd, err := factory.Create(agentID, "scan_storage", nil)
    
    require.NoError(t, err)
    assert.NotEqual(t, uuid.Nil, cmd.ID, "ID should be generated")
    assert.Equal(t, agentID, cmd.AgentID)
    assert.Equal(t, "scan_storage", cmd.CommandType)
}

func TestCommandFactory_CreateValidatesInput(t *testing.T) {
    factory := command.NewFactory()
    
    _, err := factory.Create(uuid.Nil, "", nil)
    
    assert.Error(t, err)
    assert.Contains(t, err.Error(), "validation failed")
}
```

### Test Error Logger Retry
```typescript
test('logError retries on failure then saves to localStorage', async () => {
  // Mock API to fail 3 times then succeed
  const mockPost = jest.fn()
    .mockRejectedValueOnce(new Error('Network error'))
    .mockRejectedValueOnce(new Error('Network error'))
    .mockResolvedValueOnce({});
  
  api.post = mockPost;
  
  await clientErrorLogger.logError({
    subsystem: 'storage',
    error_type: 'api_error',
    message: 'Failed to scan',
  });
  
  expect(mockPost).toHaveBeenCalledTimes(3);
  expect(localStorage.getItem).not.toHaveBeenCalled(); // Should succeed after retries
});
```

### Integration Test
```typescript
test('rapid scan button clicks work correctly', async () => {
  // Click multiple scan buttons
  await Promise.all([
    triggerStorageScan(),
    triggerSystemScan(),
    triggerDockerScan(),
  ]);
  
  // All should succeed with unique command IDs
  const commands = await getAgentCommands(agent.id);
  const uniqueIDs = new Set(commands.map(c => c.id));
  assert.equal(uniqueIDs.size, 3);
});
```

---

## Implementation Plan

### Step 1: Command Factory (15 minutes)
1. Create `aggregator-server/internal/command/factory.go`
2. Add `Validate()` method to `models.AgentCommand`
3. Update `TriggerSubsystem` and other command creation points to use factory
4. Test: Verify rapid button clicks work

### Step 2: Database Migration (5 minutes)
1. Create `023_client_error_logging.up.sql`
2. Test migration runs successfully
3. Verify table and indexes created

### Step 3: Backend Handler (20 minutes)
1. Create `aggregator-server/internal/api/handlers/client_errors.go`
2. Add route registration in router setup
3. Test API endpoint with curl

### Step 4: Frontend Logger (15 minutes)
1. Create `aggregator-web/src/lib/client-error-logger.ts`
2. Add toast wrapper in `aggregator-web/src/lib/toast-with-logging.ts`
3. Update 2-3 critical error locations to use new logger
4. Test: Verify errors appear in database

### Step 5: Verification (10 minutes)
1. Test full workflow: trigger scan, verify command ID unique
2. Test error scenario: disconnect network, verify retry works
3. Check database: confirm errors stored with context

**Total Time**: ~1 hour 5 minutes

---

## Files to Create

1. `aggregator-server/internal/command/factory.go`
2. `aggregator-server/internal/database/migrations/023_client_error_logging.up.sql`
3. `aggregator-server/internal/api/handlers/client_errors.go`
4. `aggregator-web/src/lib/client-error-logger.ts`
5. `aggregator-web/src/lib/toast-with-logging.ts`

## Files to Modify

1. `aggregator-server/internal/models/command.go` - Add Validate() method
2. `aggregator-server/internal/api/handlers/subsystems.go` - Use command factory
3. `aggregator-server/internal/api/router.go` - Register error logging route
4. 2-3 frontend files with critical error paths

---

## ETHOS Compliance Verification

- [ ] **ETHOS #1**: All errors logged with context to history table ✓
- [ ] **ETHOS #2**: Endpoint protected by auth middleware ✓
- [ ] **ETHOS #3**: Retry logic with exponential backoff implemented ✓
- [ ] **ETHOS #4**: Database constraints handle duplicate logging gracefully ✓
- [ ] **ETHOS #5**: No marketing fluff; technical, honest naming used ✓

---

**Status**: Ready for Implementation
**Recommendation**: Implement all steps in order for clean, maintainable solution