19 KiB
Clean Architecture: Command ID & Frontend Error Logging
Date: 2025-12-19 Status: CLEAN ARCHITECTURE DESIGN (ETHOS Compliant)
Problem Statement
RedFlag has two critical issues violating ETHOS principles:
-
Command ID Generation Failure: Server fails to generate unique IDs for commands, causing
pq: duplicate key value violates unique constraint "agent_commands_pkey"when users trigger multiple scans rapidly -
Frontend Errors Lost: UI failures show toasts but are never persisted, violating ETHOS #1: "Errors are History, Not /dev/null"
ETHOS Compliance Requirements
ETHOS #1: All errors must be captured, logged with context, stored in history table - NEVER to /dev/null
ETHOS #2: No unauthenticated endpoints - all routes protected by established security stack
ETHOS #3: Assume failure - implement retry logic with exponential backoff for network operations
ETHOS #4: Idempotency - system must handle duplicate operations gracefully
ETHOS #5: No marketing fluff - clear, honest naming using technical terms
Clean Architecture Design
Phase 1: Command ID Generation (Server-Side)
Problem
Commands are created without IDs, causing PostgreSQL to receive zero UUIDs (00000000-0000-0000-0000-000000000000), resulting in primary key violations on subsequent inserts.
Solution: Command Factory Pattern
// File: aggregator-server/internal/command/factory.go
package command
import (
"errors"
"fmt"
"github.com/Fimeg/RedFlag/aggregator-server/internal/models"
"github.com/google/uuid"
)
// Factory creates validated AgentCommand instances
type Factory struct{}
// NewFactory creates a new command factory
func NewFactory() *Factory {
return &Factory{}
}
// Create generates a new validated AgentCommand
func (f *Factory) Create(agentID uuid.UUID, commandType string, params map[string]interface{}) (*models.AgentCommand, error) {
cmd := &models.AgentCommand{
ID: uuid.New(), // Generation happens immediately and explicitly
AgentID: agentID,
CommandType: commandType,
Status: "pending",
Source: "manual",
Params: params,
}
if err := cmd.Validate(); err != nil {
return nil, fmt.Errorf("command validation failed: %w", err)
}
return cmd, nil
}
Add validation to AgentCommand model:
// File: aggregator-server/internal/models/command.go
// Validate checks if the command is valid
func (c *AgentCommand) Validate() error {
if c.ID == uuid.Nil {
return errors.New("command ID cannot be zero UUID")
}
if c.AgentID == uuid.Nil {
return errors.New("agent ID required")
}
if c.CommandType == "" {
return errors.New("command type required")
}
if c.Status == "" {
return errors.New("status required")
}
if c.Source != "manual" && c.Source != "system" {
return errors.New("source must be 'manual' or 'system'")
}
return nil
}
Rationale: Factory pattern ensures IDs are always generated at creation time, making it impossible to create invalid commands. Fail-fast validation catches issues immediately.
Impact: Fixes the immediate duplicate key error and prevents similar bugs in all future command creation.
Phase 2: Frontend Error Logging (UI to Server)
Problem
Frontend shows errors via toast notifications but never persists them. When users report "the button didn't work," we have no record of what failed, when, or why.
ETHOS #1 Violation: Errors that exist only in browser memory are equivalent to /dev/null
Solution: Client Error Logging System
Step 2.1: Database Schema
-- File: aggregator-server/internal/database/migrations/023_client_error_logging.up.sql
-- Purpose: Store frontend errors for debugging and auditing
CREATE TABLE client_errors (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
agent_id UUID REFERENCES agents(id) ON DELETE SET NULL,
subsystem VARCHAR(50) NOT NULL,
error_type VARCHAR(50) NOT NULL, -- 'javascript_error', 'api_error', 'ui_error', 'validation_error'
message TEXT NOT NULL,
stack_trace TEXT,
metadata JSONB,
url TEXT NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
-- Indexes for common query patterns
CREATE INDEX idx_client_errors_agent_time ON client_errors(agent_id, created_at DESC);
CREATE INDEX idx_client_errors_subsystem_time ON client_errors(subsystem, created_at DESC);
CREATE INDEX idx_client_errors_type_time ON client_errors(error_type, created_at DESC);
-- Comments for documentation
COMMENT ON TABLE client_errors IS 'Frontend error logs for debugging and auditing';
COMMENT ON COLUMN client_errors.agent_id IS 'Agent that was active when error occurred (NULL for pre-auth errors)';
COMMENT ON COLUMN client_errors.subsystem IS 'Which RedFlag subsystem was being used';
COMMENT ON COLUMN client_errors.error_type IS 'Category of error for filtering';
COMMENT ON COLUMN client_errors.metadata IS 'Additional context (component name, API response, user actions)';
Rationale: Proper schema with indexes allows efficient querying. References agents table to correlate errors with specific agents. Stores rich context for debugging.
Step 2.2: Backend Handler
// File: aggregator-server/internal/api/handlers/client_errors.go
package handlers
import (
"database/sql"
"fmt"
"log"
"net/http"
"time"
"github.com/gin-gonic/gin"
"github.com/jmoiron/sqlx"
)
// ClientErrorHandler handles frontend error logging
type ClientErrorHandler struct {
db *sqlx.DB
}
// NewClientErrorHandler creates a new error handler
func NewClientErrorHandler(db *sqlx.DB) *ClientErrorHandler {
return &ClientErrorHandler{db: db}
}
// LogError processes and stores frontend errors
func (h *ClientErrorHandler) LogError(c *gin.Context) {
// Extract agent ID from auth middleware if available
var agentID interface{}
if agentIDValue, exists := c.Get("agentID"); exists {
agentID = agentIDValue
}
var req struct {
Subsystem string `json:"subsystem" binding:"required"`
ErrorType string `json:"error_type" binding:"required,oneof=javascript_error api_error ui_error validation_error"`
Message string `json:"message" binding:"required"`
StackTrace string `json:"stack_trace,omitempty"`
Metadata map[string]interface{} `json:"metadata,omitempty"`
URL string `json:"url" binding:"required"`
}
if err := c.ShouldBindJSON(&req); err != nil {
log.Printf("[ERROR] [server] [client_error] validation_failed error=\"%v\"", err)
c.JSON(http.StatusBadRequest, gin.H{"error": "invalid request data"})
return
}
// Log to console with HISTORY prefix for unified logging
log.Printf("[ERROR] [server] [client] [%s] agent_id=%v subsystem=%s message=\"%s\"",
req.ErrorType, agentID, req.Subsystem, req.Message)
log.Printf("[HISTORY] [server] [client_error] agent_id=%v subsystem=%s type=%s url=\"%s\" message=\"%s\" timestamp=%s",
agentID, req.Subsystem, req.ErrorType, req.URL, req.Message, time.Now().Format(time.RFC3339))
// Attempt to store in database with retry logic
const maxRetries = 3
var lastErr error
for attempt := 1; attempt <= maxRetries; attempt++ {
query := `INSERT INTO client_errors (agent_id, subsystem, error_type, message, stack_trace, metadata, url)
VALUES (:agent_id, :subsystem, :error_type, :message, :stack_trace, :metadata, :url)`
_, err := h.db.NamedExec(query, map[string]interface{}{
"agent_id": agentID,
"subsystem": req.Subsystem,
"error_type": req.ErrorType,
"message": req.Message,
"stack_trace": req.StackTrace,
"metadata": req.Metadata,
"url": req.URL,
})
if err == nil {
c.JSON(http.StatusOK, gin.H{"logged": true})
return
}
lastErr = err
if attempt < maxRetries {
time.Sleep(time.Duration(attempt) * time.Second)
continue
}
}
log.Printf("[ERROR] [server] [client_error] persistent_failure error=\"%v\"", lastErr)
c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to persist error after retries"})
}
Rationale:
- Validates input before processing
- Logs with [HISTORY] prefix for unified log aggregation
- Implements retry logic per ETHOS #3 (Assume Failure)
- Returns appropriate HTTP status codes
- Handles database connection failures gracefully
Step 2.3: Frontend Error Logger
// File: aggregator-web/src/lib/client-error-logger.ts
import { api, ApiError } from './api';
export interface ClientErrorLog {
subsystem: string;
error_type: 'javascript_error' | 'api_error' | 'ui_error' | 'validation_error';
message: string;
stack_trace?: string;
metadata?: Record<string, any>;
url: string;
}
/**
* ClientErrorLogger provides reliable frontend error logging to backend
* Implements retry logic per ETHOS #3 (Assume Failure)
*/
export class ClientErrorLogger {
private maxRetries = 3;
private baseDelayMs = 1000;
private localStorageKey = 'redflag-failed-error-logs';
/**
* Log an error to the backend with automatic retry
*/
async logError(errorData: Omit<ClientErrorLog, 'url'>): Promise<void> {
const fullError: ClientErrorLog = {
...errorData,
url: window.location.href,
};
for (let attempt = 1; attempt <= this.maxRetries; attempt++) {
try {
await api.post('/logs/client-error', fullError, {
// Add header to prevent infinite loop if error logger fails
headers: { 'X-Error-Logger-Request': 'true' },
});
return; // Success
} catch (error) {
if (attempt === this.maxRetries) {
// Save to localStorage for later retry
this.saveFailedLog({ ...fullError, attempt });
} else {
// Exponential backoff
await this.sleep(this.baseDelayMs * attempt);
}
}
}
}
/**
* Attempt to resend failed error logs from localStorage
*/
async retryFailedLogs(): Promise<void> {
const failedLogs = this.getFailedLogs();
if (failedLogs.length === 0) return;
const remaining: any[] = [];
for (const log of failedLogs) {
try {
await this.logError(log);
} catch {
remaining.push(log);
}
}
if (remaining.length < failedLogs.length) {
// Some succeeded, update localStorage
localStorage.setItem(this.localStorageKey, JSON.stringify(remaining));
}
}
private sleep(ms: number): Promise<void> {
return new Promise(resolve => setTimeout(resolve, ms));
}
private saveFailedLog(log: any): void {
try {
const existing = this.getFailedLogs();
existing.push(log);
localStorage.setItem(this.localStorageKey, JSON.stringify(existing));
} catch {
// localStorage might be full or unavailable
}
}
private getFailedLogs(): any[] {
try {
const stored = localStorage.getItem(this.localStorageKey);
return stored ? JSON.parse(stored) : [];
} catch {
return [];
}
}
}
// Singleton instance
export const clientErrorLogger = new ClientErrorLogger();
// Auto-retry failed logs on app load
if (typeof window !== 'undefined') {
window.addEventListener('load', () => {
clientErrorLogger.retryFailedLogs().catch(() => {});
});
}
Rationale:
- Implements ETHOS #3 (Assume Failure) with exponential backoff
- Saves failed logs to localStorage for retry when network recovers
- Auto-retry on app load captures errors from previous sessions
- No infinite loops (X-Error-Logger-Request header)
Step 2.4: Toast Integration
// File: aggregator-web/src/lib/toast-with-logging.ts
import toast, { ToastOptions } from 'react-hot-toast';
import { clientErrorLogger } from './client-error-logger';
// Store reference to original methods
const toastError = toast.error;
const toastSuccess = toast.success;
/**
* Wraps toast.error to automatically log errors to backend
* Implements ETHOS #1 (Errors are History)
*/
export const toastWithLogging = {
error: (message: string, subsystem: string, options?: ToastOptions) => {
// Log to backend asynchronously - don't block UI
clientErrorLogger.logError({
subsystem,
error_type: 'ui_error',
message: message.substring(0, 1000), // Prevent excessively long messages
metadata: {
timestamp: new Date().toISOString(),
user_agent: navigator.userAgent,
},
}).catch(() => {
// Silently ignore logging failures - don't crash the UI
});
// Show toast to user
return toastError(message, options);
},
success: toastSuccess,
info: toast.info,
warning: toast.warning,
loading: toast.loading,
dismiss: toast.dismiss,
};
Rationale: Transparent wrapper that maintains toast API while adding error logging. User experience unchanged but errors now persist to history table.
Implementation Evaluation: Retry Logic Necessity
Question: Does every client error log need exponential backoff retry?
Analysis:
Errors That SHOULD Have Retry:
-
API Errors: Network failures, server 502s, connection timeouts
- High value: These indicate real problems
- Retry needed: Network glitches common
-
Critical UI Failures: Command creation failures, permission errors
- High value: Affect user workflow
- Retry needed: Server might be temporarily overloaded
Errors That Could Skip Retry:
-
Validation Errors: User entered invalid data
- Low value: Expected behavior, not a system issue
- No retry: Will immediately fail again
-
Browser Compatibility Issues: Old browser, missing features
- Low value: Persistent problem until user upgrades
- No retry: Won't fix itself
Recommendation: Use Retry for API and Critical Errors Only
// Simplified version for validation errors (no retry)
export const logValidationError = async (subsystem: string, message: string) => {
try {
await api.post('/logs/client-error', {
subsystem,
error_type: 'validation_error',
message,
});
} catch {
// Best effort only - validation errors aren't critical
}
};
// Full retry version for API errors
export const logApiError = async (subsystem: string, message: string) => {
clientErrorLogger.logError({
subsystem,
error_type: 'api_error',
message,
});
};
Decision: Keep retry logic in the general logger (most errors are API/critical), create specific no-retry helpers for validation cases.
Testing Strategy
Test Command ID Generation
func TestCommandFactory_Create(t *testing.T) {
factory := command.NewFactory()
agentID := uuid.New()
cmd, err := factory.Create(agentID, "scan_storage", nil)
require.NoError(t, err)
assert.NotEqual(t, uuid.Nil, cmd.ID, "ID should be generated")
assert.Equal(t, agentID, cmd.AgentID)
assert.Equal(t, "scan_storage", cmd.CommandType)
}
func TestCommandFactory_CreateValidatesInput(t *testing.T) {
factory := command.NewFactory()
_, err := factory.Create(uuid.Nil, "", nil)
assert.Error(t, err)
assert.Contains(t, err.Error(), "validation failed")
}
Test Error Logger Retry
test('logError retries on failure then saves to localStorage', async () => {
// Mock API to fail 3 times then succeed
const mockPost = jest.fn()
.mockRejectedValueOnce(new Error('Network error'))
.mockRejectedValueOnce(new Error('Network error'))
.mockResolvedValueOnce({});
api.post = mockPost;
await clientErrorLogger.logError({
subsystem: 'storage',
error_type: 'api_error',
message: 'Failed to scan',
});
expect(mockPost).toHaveBeenCalledTimes(3);
expect(localStorage.getItem).not.toHaveBeenCalled(); // Should succeed after retries
});
Integration Test
test('rapid scan button clicks work correctly', async () => {
// Click multiple scan buttons
await Promise.all([
triggerStorageScan(),
triggerSystemScan(),
triggerDockerScan(),
]);
// All should succeed with unique command IDs
const commands = await getAgentCommands(agent.id);
const uniqueIDs = new Set(commands.map(c => c.id));
assert.equal(uniqueIDs.size, 3);
});
Implementation Plan
Step 1: Command Factory (15 minutes)
- Create
aggregator-server/internal/command/factory.go - Add
Validate()method tomodels.AgentCommand - Update
TriggerSubsystemand other command creation points to use factory - Test: Verify rapid button clicks work
Step 2: Database Migration (5 minutes)
- Create
023_client_error_logging.up.sql - Test migration runs successfully
- Verify table and indexes created
Step 3: Backend Handler (20 minutes)
- Create
aggregator-server/internal/api/handlers/client_errors.go - Add route registration in router setup
- Test API endpoint with curl
Step 4: Frontend Logger (15 minutes)
- Create
aggregator-web/src/lib/client-error-logger.ts - Add toast wrapper in
aggregator-web/src/lib/toast-with-logging.ts - Update 2-3 critical error locations to use new logger
- Test: Verify errors appear in database
Step 5: Verification (10 minutes)
- Test full workflow: trigger scan, verify command ID unique
- Test error scenario: disconnect network, verify retry works
- Check database: confirm errors stored with context
Total Time: ~1 hour 5 minutes
Files to Create
aggregator-server/internal/command/factory.goaggregator-server/internal/database/migrations/023_client_error_logging.up.sqlaggregator-server/internal/api/handlers/client_errors.goaggregator-web/src/lib/client-error-logger.tsaggregator-web/src/lib/toast-with-logging.ts
Files to Modify
aggregator-server/internal/models/command.go- Add Validate() methodaggregator-server/internal/api/handlers/subsystems.go- Use command factoryaggregator-server/internal/api/router.go- Register error logging route- 2-3 frontend files with critical error paths
ETHOS Compliance Verification
- ETHOS #1: All errors logged with context to history table ✓
- ETHOS #2: Endpoint protected by auth middleware ✓
- ETHOS #3: Retry logic with exponential backoff implemented ✓
- ETHOS #4: Database constraints handle duplicate logging gracefully ✓
- ETHOS #5: No marketing fluff; technical, honest naming used ✓
Status: Ready for Implementation Recommendation: Implement all steps in order for clean, maintainable solution