Add docs and project files - force for Culurien

2026-03-28 20:46:24 -04:00
parent dc61797423
commit 484a7f77ce
343 changed files with 119530 additions and 0 deletions
--- a/docs/3_BACKLOG/P4-005_Testing-Infrastructure-Gaps.md
+++ b/docs/3_BACKLOG/P4-005_Testing-Infrastructure-Gaps.md
@@ -0,0 +1,567 @@
+# P4-005: Testing Infrastructure Gaps
+
+**Priority:** P4 (Technical Debt)
+**Source Reference:** From analysis of codebase testing coverage and existing test files
+**Date Identified:** 2025-11-12
+
+## Problem Description
+
+RedFlag has minimal testing infrastructure with only 5 test files covering basic functionality. Critical components like agent communication, authentication, scanner integration, and database operations lack comprehensive test coverage. This creates high risk for regressions and makes confident deployment difficult.
+
+## Current Test Coverage Analysis
+
+### Existing Tests (5 files)
+1. `aggregator-agent/internal/circuitbreaker/circuitbreaker_test.go` - Basic circuit breaker
+2. `aggregator-agent/test_disk.go` - Disk detection testing (development)
+3. `test_disk_detection.go` - Disk detection integration test
+4. `aggregator-server/internal/scheduler/queue_test.go` - Queue operations (21 tests passing)
+5. `aggregator-server/internal/scheduler/scheduler_test.go` - Scheduler logic (21 tests passing)
+
+### Critical Missing Test Areas
+
+#### Agent Components (0% coverage)
+- Agent registration and authentication
+- Scanner implementations (APT, DNF, Docker, Windows, Winget)
+- Command execution and acknowledgment
+- File management and state persistence
+- Error handling and resilience
+- Cross-platform compatibility
+
+#### Server Components (Minimal coverage)
+- API endpoints and handlers
+- Database operations and queries
+- Authentication and authorization
+- Rate limiting and security middleware
+- Agent lifecycle management
+- Update package distribution
+
+#### Integration Testing (0% coverage)
+- End-to-end agent-server communication
+- Multi-agent scenarios
+- Error recovery and failover
+- Performance under load
+- Security validation (Ed25519, nonces, machine binding)
+
+#### Security Testing (0% coverage)
+- Cryptographic operations validation
+- Authentication bypass attempts
+- Input validation and sanitization
+- Rate limiting effectiveness
+- Machine binding enforcement
+
+## Impact
+
+- **Regression Risk:** No safety net for code changes
+- **Deployment Confidence:** Cannot verify system reliability
+- **Quality Assurance:** Manual testing is time-consuming and error-prone
+- **Security Validation:** No automated security testing
+- **Performance Testing:** No way to detect performance regressions
+- **Documentation Gaps:** Tests serve as living documentation
+
+## Proposed Solution
+
+Implement comprehensive testing infrastructure across all components:
+
+### 1. Unit Testing Framework
+```go
+// Test configuration and utilities
+// aggregator/internal/testutil/testutil.go
+package testutil
+
+import (
+    "testing"
+    "github.com/stretchr/testify/assert"
+    "github.com/stretchr/testify/mock"
+    "github.com/stretchr/testify/suite"
+)
+
+type TestSuite struct {
+    suite.Suite
+    DB     *sql.DB
+    Config *Config
+    Server *httptest.Server
+}
+
+func (s *TestSuite) SetupSuite() {
+    // Initialize test database
+    s.DB = setupTestDB()
+
+    // Initialize test configuration
+    s.Config = &Config{
+        DatabaseURL: "postgres://test:test@localhost/redflag_test",
+        ServerPort:  0, // Random port for testing
+    }
+}
+
+func (s *TestSuite) TearDownSuite() {
+    if s.DB != nil {
+        s.DB.Close()
+    }
+    cleanupTestDB()
+}
+
+func (s *TestSuite) SetupTest() {
+    // Reset database state before each test
+    resetTestDB(s.DB)
+}
+
+// Mock implementations
+type MockScanner struct {
+    mock.Mock
+}
+
+func (m *MockScanner) ScanForUpdates() ([]UpdateReportItem, error) {
+    args := m.Called()
+    return args.Get(0).([]UpdateReportItem), args.Error(1)
+}
+```
+
+### 2. Agent Component Tests
+```go
+// aggregator-agent/cmd/agent/main_test.go
+func TestAgentRegistration(t *testing.T) {
+    tests := []struct {
+        name           string
+        token          string
+        expectedStatus int
+        expectedError  string
+    }{
+        {
+            name:           "Valid registration",
+            token:          "valid-token-123",
+            expectedStatus: http.StatusCreated,
+        },
+        {
+            name:           "Invalid token",
+            token:          "invalid-token",
+            expectedStatus: http.StatusUnauthorized,
+            expectedError:  "invalid registration token",
+        },
+    }
+
+    for _, tt := range tests {
+        t.Run(tt.name, func(t *testing.T) {
+            server := setupTestServer(t)
+            defer server.Close()
+
+            agent := &Agent{
+                ServerURL: server.URL,
+                Token:     tt.token,
+            }
+
+            err := agent.Register()
+            if tt.expectedError != "" {
+                assert.Contains(t, err.Error(), tt.expectedError)
+            } else {
+                assert.NoError(t, err)
+            }
+        })
+    }
+}
+
+// aggregator-agent/internal/scanner/dnf_test.go
+func TestDNFScanner(t *testing.T) {
+    // Test with mock dnf command
+    scanner := &DNFScanner{}
+
+    t.Run("Successful scan", func(t *testing.T) {
+        // Mock successful dnf check-update output
+        withMockCommand("dnf", "check-update", successfulDNFOutput, func() {
+            updates, err := scanner.ScanForUpdates()
+            assert.NoError(t, err)
+            assert.NotEmpty(t, updates)
+
+            // Verify update parsing
+            nginx := findUpdate(updates, "nginx")
+            assert.NotNil(t, nginx)
+            assert.Equal(t, "1.20.1", nginx.CurrentVersion)
+            assert.Equal(t, "1.21.0", nginx.AvailableVersion)
+        })
+    })
+
+    t.Run("DNF not available", func(t *testing.T) {
+        scanner.executable = "nonexistent-dnf"
+        _, err := scanner.ScanForUpdates()
+        assert.Error(t, err)
+        assert.Contains(t, err.Error(), "dnf not found")
+    })
+}
+```
+
+### 3. Server Component Tests
+```go
+// aggregator-server/internal/api/handlers/agents_test.go
+func TestAgentsHandler_RegisterAgent(t *testing.T) {
+    suite := &TestSuite{}
+    suite.SetupSuite()
+    defer suite.TearDownSuite()
+
+    tests := []struct {
+        name           string
+        requestBody    string
+        expectedStatus int
+        setupToken     bool
+    }{
+        {
+            name:        "Valid registration",
+            requestBody: `{"hostname":"test-host","os_type":"linux","agent_version":"0.1.23"}`,
+            setupToken:  true,
+            expectedStatus: http.StatusCreated,
+        },
+        {
+            name:           "Invalid JSON",
+            requestBody:    `{"hostname":}`,
+            expectedStatus: http.StatusBadRequest,
+        },
+        {
+            name:           "Missing token",
+            requestBody:    `{"hostname":"test-host"}`,
+            expectedStatus: http.StatusUnauthorized,
+        },
+    }
+
+    for _, tt := range tests {
+        t.Run(tt.name, func(t *testing.T) {
+            suite.SetupTest()
+
+            if tt.setupToken {
+                token := createTestToken(suite.DB, 5)
+                suite.Config.JWTSecret = "test-secret"
+            }
+
+            req := httptest.NewRequest("POST", "/api/v1/agents/register",
+                strings.NewReader(tt.requestBody))
+            req.Header.Set("Content-Type", "application/json")
+            if tt.setupToken {
+                req.Header.Set("Authorization", "Bearer test-token")
+            }
+
+            w := httptest.NewRecorder()
+            handler := NewAgentsHandler(suite.DB, suite.Config)
+            handler.RegisterAgent(w, req)
+
+            assert.Equal(t, tt.expectedStatus, w.Code)
+        })
+    }
+}
+
+// aggregator-server/internal/database/queries/agents_test.go
+func TestAgentQueries(t *testing.T) {
+    db := setupTestDB(t)
+    queries := NewAgentQueries(db)
+
+    t.Run("Create and retrieve agent", func(t *testing.T) {
+        agent := &models.Agent{
+            ID:        uuid.New(),
+            Hostname:  "test-host",
+            OSType:    "linux",
+            Version:   "0.1.23",
+            CreatedAt: time.Now(),
+        }
+
+        // Create agent
+        err := queries.CreateAgent(agent)
+        assert.NoError(t, err)
+
+        // Retrieve agent
+        retrieved, err := queries.GetAgent(agent.ID)
+        assert.NoError(t, err)
+        assert.Equal(t, agent.Hostname, retrieved.Hostname)
+        assert.Equal(t, agent.OSType, retrieved.OSType)
+    })
+}
+```
+
+### 4. Integration Tests
+```go
+// integration/agent_server_test.go
+func TestAgentServerIntegration(t *testing.T) {
+    if testing.Short() {
+        t.Skip("Skipping integration test in short mode")
+    }
+
+    // Setup test environment
+    server := setupIntegrationServer(t)
+    defer server.Cleanup()
+
+    agent := setupIntegrationAgent(t, server.URL)
+    defer agent.Cleanup()
+
+    t.Run("Complete agent lifecycle", func(t *testing.T) {
+        // Registration
+        err := agent.Register()
+        assert.NoError(t, err)
+
+        // First check-in (no commands)
+        commands, err := agent.CheckIn()
+        assert.NoError(t, err)
+        assert.Empty(t, commands)
+
+        // Send scan command
+        scanCmd := &Command{
+            Type: "scan_updates",
+            ID:   uuid.New(),
+        }
+        err = server.SendCommand(agent.ID, scanCmd)
+        assert.NoError(t, err)
+
+        // Second check-in (should receive command)
+        commands, err = agent.CheckIn()
+        assert.NoError(t, err)
+        assert.Len(t, commands, 1)
+        assert.Equal(t, "scan_updates", commands[0].Type)
+
+        // Execute command and report results
+        result := agent.ExecuteCommand(commands[0])
+        err = agent.ReportResult(result)
+        assert.NoError(t, err)
+
+        // Verify command completion
+        cmdStatus, err := server.GetCommandStatus(scanCmd.ID)
+        assert.NoError(t, err)
+        assert.Equal(t, "completed", cmdStatus.Status)
+    })
+}
+
+// integration/security_test.go
+func TestSecurityFeatures(t *testing.T) {
+    server := setupIntegrationServer(t)
+    defer server.Cleanup()
+
+    t.Run("Machine binding enforcement", func(t *testing.T) {
+        agent1 := setupIntegrationAgent(t, server.URL)
+        agent2 := setupIntegrationAgentWithMachineID(t, server.URL, agent1.MachineID)
+
+        // Register first agent
+        err := agent1.Register()
+        assert.NoError(t, err)
+
+        // Attempt to register second agent with same machine ID
+        err = agent2.Register()
+        assert.Error(t, err)
+        assert.Contains(t, err.Error(), "machine ID already registered")
+    })
+
+    t.Run("Ed25519 signature validation", func(t *testing.T) {
+        // Test with valid signature
+        validPackage := createSignedPackage(t, server.PrivateKey)
+        err := agent.VerifyPackageSignature(validPackage)
+        assert.NoError(t, err)
+
+        // Test with invalid signature
+        invalidPackage := createSignedPackage(t, "wrong-key")
+        err = agent.VerifyPackageSignature(invalidPackage)
+        assert.Error(t, err)
+        assert.Contains(t, err.Error(), "invalid signature")
+    })
+}
+```
+
+### 5. Performance Tests
+```go
+// performance/load_test.go
+func BenchmarkAgentCheckIn(b *testing.B) {
+    server := setupBenchmarkServer(b)
+    defer server.Cleanup()
+
+    agent := setupBenchmarkAgent(b, server.URL)
+
+    b.ResetTimer()
+    b.RunParallel(func(pb *testing.PB) {
+        for pb.Next() {
+            _, err := agent.CheckIn()
+            if err != nil {
+                b.Fatal(err)
+            }
+        }
+    })
+}
+
+func TestConcurrentAgents(t *testing.T) {
+    server := setupIntegrationServer(t)
+    defer server.Cleanup()
+
+    numAgents := 100
+    var wg sync.WaitGroup
+    errors := make(chan error, numAgents)
+
+    for i := 0; i < numAgents; i++ {
+        wg.Add(1)
+        go func(id int) {
+            defer wg.Done()
+
+            agent := setupIntegrationAgentWithID(t, server.URL, fmt.Sprintf("agent-%d", id))
+            err := agent.Register()
+            if err != nil {
+                errors <- fmt.Errorf("agent %d registration failed: %w", id, err)
+                return
+            }
+
+            // Perform several check-ins
+            for j := 0; j < 5; j++ {
+                _, err := agent.CheckIn()
+                if err != nil {
+                    errors <- fmt.Errorf("agent %d check-in %d failed: %w", id, j, err)
+                    return
+                }
+            }
+        }(i)
+    }
+
+    wg.Wait()
+    close(errors)
+
+    // Check for any errors
+    for err := range errors {
+        t.Error(err)
+    }
+}
+```
+
+### 6. Test Database Setup
+```go
+// internal/testutil/db.go
+package testutil
+
+import (
+    "database/sql"
+    "fmt"
+    "os"
+)
+
+func setupTestDB(t *testing.T) *sql.DB {
+    db, err := sql.Open("postgres", "postgres://test:test@localhost/redflag_test?sslmode=disable")
+    if err != nil {
+        t.Fatalf("Failed to connect to test database: %v", err)
+    }
+
+    // Run migrations
+    if err := runMigrations(db); err != nil {
+        t.Fatalf("Failed to run migrations: %v", err)
+    }
+
+    return db
+}
+
+func resetTestDB(db *sql.DB) error {
+    tables := []string{
+        "agent_commands", "update_logs", "registration_token_usage",
+        "registration_tokens", "refresh_tokens", "agents",
+    }
+
+    tx, err := db.Begin()
+    if err != nil {
+        return err
+    }
+    defer tx.Rollback()
+
+    for _, table := range tables {
+        _, err := tx.Exec(fmt.Sprintf("DELETE FROM %s", table))
+        if err != nil {
+            return err
+        }
+    }
+
+    return tx.Commit()
+}
+```
+
+## Definition of Done
+
+- [ ] Unit test coverage >80% for all critical components
+- [ ] Integration test coverage for all major workflows
+- [ ] Performance tests for scalability validation
+- [ ] Security tests for authentication and cryptographic features
+- [ ] CI/CD pipeline with automated testing
+- [ ] Test database setup and migration testing
+- [ ] Mock implementations for external dependencies
+- [ ] Test documentation and examples
+
+## Implementation Plan
+
+### Phase 1: Foundation (Week 1)
+- Set up testing framework and utilities
+- Create test database setup
+- Implement mock objects for external dependencies
+- Add basic unit tests for core components
+
+### Phase 2: Agent Testing (Week 2)
+- Scanner implementation tests
+- Agent lifecycle tests
+- Error handling and resilience tests
+- Cross-platform compatibility tests
+
+### Phase 3: Server Testing (Week 3)
+- API endpoint tests
+- Database operation tests
+- Authentication and security tests
+- Rate limiting and middleware tests
+
+### Phase 4: Integration & Performance (Week 4)
+- End-to-end integration tests
+- Multi-agent scenarios
+- Performance and load tests
+- Security validation tests
+
+## Testing Strategy
+
+### Unit Tests
+- Focus on individual component behavior
+- Mock external dependencies
+- Fast execution (<1 second per test)
+- Cover edge cases and error conditions
+
+### Integration Tests
+- Test component interactions
+- Use real database and filesystem
+- Slower execution but comprehensive coverage
+- Validate complete workflows
+
+### Performance Tests
+- Measure response times and throughput
+- Test under realistic load conditions
+- Identify performance bottlenecks
+- Validate scalability claims
+
+### Security Tests
+- Validate authentication mechanisms
+- Test cryptographic operations
+- Verify input validation
+- Check for common vulnerabilities
+
+## Prerequisites
+
+- Test database instance (PostgreSQL)
+- CI/CD pipeline infrastructure
+- Mock implementations for external services
+- Performance testing environment
+- Security testing tools and knowledge
+
+## Effort Estimate
+
+**Complexity:** High
+**Effort:** 4 weeks (1 developer)
+- Week 1: Testing framework and foundation
+- Week 2: Agent component tests
+- Week 3: Server component tests
+- Week 4: Integration and performance tests
+
+## Success Metrics
+
+- Code coverage >80% for critical components
+- All major workflows covered by integration tests
+- Performance tests validate 10,000+ agent support
+- Security tests verify authentication and cryptography
+- CI/CD pipeline runs tests automatically
+- Regression detection for new features
+- Documentation includes testing guidelines
+
+## Monitoring
+
+Track these metrics after implementation:
+- Test execution time trends
+- Code coverage percentage
+- Test failure rates
+- Performance benchmark results
+- Security test findings
+- Developer satisfaction with testing tools