Redflag/docs/3_BACKLOG/P4-005_Testing-Infrastructure-Gaps.md

# P4-005: Testing Infrastructure Gaps

**Priority:** P4 (Technical Debt)
**Source Reference:** From analysis of codebase testing coverage and existing test files
**Date Identified:** 2025-11-12

## Problem Description

RedFlag has minimal testing infrastructure with only 5 test files covering basic functionality. Critical components like agent communication, authentication, scanner integration, and database operations lack comprehensive test coverage. This creates high risk for regressions and makes confident deployment difficult.

## Current Test Coverage Analysis

### Existing Tests (5 files)
1. `aggregator-agent/internal/circuitbreaker/circuitbreaker_test.go` - Basic circuit breaker
2. `aggregator-agent/test_disk.go` - Disk detection testing (development)
3. `test_disk_detection.go` - Disk detection integration test
4. `aggregator-server/internal/scheduler/queue_test.go` - Queue operations (21 tests passing)
5. `aggregator-server/internal/scheduler/scheduler_test.go` - Scheduler logic (21 tests passing)

### Critical Missing Test Areas

#### Agent Components (0% coverage)
- Agent registration and authentication
- Scanner implementations (APT, DNF, Docker, Windows, Winget)
- Command execution and acknowledgment
- File management and state persistence
- Error handling and resilience
- Cross-platform compatibility

#### Server Components (Minimal coverage)
- API endpoints and handlers
- Database operations and queries
- Authentication and authorization
- Rate limiting and security middleware
- Agent lifecycle management
- Update package distribution

#### Integration Testing (0% coverage)
- End-to-end agent-server communication
- Multi-agent scenarios
- Error recovery and failover
- Performance under load
- Security validation (Ed25519, nonces, machine binding)

#### Security Testing (0% coverage)
- Cryptographic operations validation
- Authentication bypass attempts
- Input validation and sanitization
- Rate limiting effectiveness
- Machine binding enforcement

## Impact

- **Regression Risk:** No safety net for code changes
- **Deployment Confidence:** Cannot verify system reliability
- **Quality Assurance:** Manual testing is time-consuming and error-prone
- **Security Validation:** No automated security testing
- **Performance Testing:** No way to detect performance regressions
- **Documentation Gaps:** Tests serve as living documentation

## Proposed Solution

Implement comprehensive testing infrastructure across all components:

### 1. Unit Testing Framework
```go
// Test configuration and utilities
// aggregator/internal/testutil/testutil.go
package testutil

import (
    "testing"
    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/mock"
    "github.com/stretchr/testify/suite"
)

type TestSuite struct {
    suite.Suite
    DB     *sql.DB
    Config *Config
    Server *httptest.Server
}

func (s *TestSuite) SetupSuite() {
    // Initialize test database
    s.DB = setupTestDB()

    // Initialize test configuration
    s.Config = &Config{
        DatabaseURL: "postgres://test:test@localhost/redflag_test",
        ServerPort:  0, // Random port for testing
    }
}

func (s *TestSuite) TearDownSuite() {
    if s.DB != nil {
        s.DB.Close()
    }
    cleanupTestDB()
}

func (s *TestSuite) SetupTest() {
    // Reset database state before each test
    resetTestDB(s.DB)
}

// Mock implementations
type MockScanner struct {
    mock.Mock
}

func (m *MockScanner) ScanForUpdates() ([]UpdateReportItem, error) {
    args := m.Called()
    return args.Get(0).([]UpdateReportItem), args.Error(1)
}
```

### 2. Agent Component Tests
```go
// aggregator-agent/cmd/agent/main_test.go
func TestAgentRegistration(t *testing.T) {
    tests := []struct {
        name           string
        token          string
        expectedStatus int
        expectedError  string
    }{
        {
            name:           "Valid registration",
            token:          "valid-token-123",
            expectedStatus: http.StatusCreated,
        },
        {
            name:           "Invalid token",
            token:          "invalid-token",
            expectedStatus: http.StatusUnauthorized,
            expectedError:  "invalid registration token",
        },
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            server := setupTestServer(t)
            defer server.Close()

            agent := &Agent{
                ServerURL: server.URL,
                Token:     tt.token,
            }

            err := agent.Register()
            if tt.expectedError != "" {
                assert.Contains(t, err.Error(), tt.expectedError)
            } else {
                assert.NoError(t, err)
            }
        })
    }
}

// aggregator-agent/internal/scanner/dnf_test.go
func TestDNFScanner(t *testing.T) {
    // Test with mock dnf command
    scanner := &DNFScanner{}

    t.Run("Successful scan", func(t *testing.T) {
        // Mock successful dnf check-update output
        withMockCommand("dnf", "check-update", successfulDNFOutput, func() {
            updates, err := scanner.ScanForUpdates()
            assert.NoError(t, err)
            assert.NotEmpty(t, updates)

            // Verify update parsing
            nginx := findUpdate(updates, "nginx")
            assert.NotNil(t, nginx)
            assert.Equal(t, "1.20.1", nginx.CurrentVersion)
            assert.Equal(t, "1.21.0", nginx.AvailableVersion)
        })
    })

    t.Run("DNF not available", func(t *testing.T) {
        scanner.executable = "nonexistent-dnf"
        _, err := scanner.ScanForUpdates()
        assert.Error(t, err)
        assert.Contains(t, err.Error(), "dnf not found")
    })
}
```

### 3. Server Component Tests
```go
// aggregator-server/internal/api/handlers/agents_test.go
func TestAgentsHandler_RegisterAgent(t *testing.T) {
    suite := &TestSuite{}
    suite.SetupSuite()
    defer suite.TearDownSuite()

    tests := []struct {
        name           string
        requestBody    string
        expectedStatus int
        setupToken     bool
    }{
        {
            name:        "Valid registration",
            requestBody: `{"hostname":"test-host","os_type":"linux","agent_version":"0.1.23"}`,
            setupToken:  true,
            expectedStatus: http.StatusCreated,
        },
        {
            name:           "Invalid JSON",
            requestBody:    `{"hostname":}`,
            expectedStatus: http.StatusBadRequest,
        },
        {
            name:           "Missing token",
            requestBody:    `{"hostname":"test-host"}`,
            expectedStatus: http.StatusUnauthorized,
        },
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            suite.SetupTest()

            if tt.setupToken {
                token := createTestToken(suite.DB, 5)
                suite.Config.JWTSecret = "test-secret"
            }

            req := httptest.NewRequest("POST", "/api/v1/agents/register",
                strings.NewReader(tt.requestBody))
            req.Header.Set("Content-Type", "application/json")
            if tt.setupToken {
                req.Header.Set("Authorization", "Bearer test-token")
            }

            w := httptest.NewRecorder()
            handler := NewAgentsHandler(suite.DB, suite.Config)
            handler.RegisterAgent(w, req)

            assert.Equal(t, tt.expectedStatus, w.Code)
        })
    }
}

// aggregator-server/internal/database/queries/agents_test.go
func TestAgentQueries(t *testing.T) {
    db := setupTestDB(t)
    queries := NewAgentQueries(db)

    t.Run("Create and retrieve agent", func(t *testing.T) {
        agent := &models.Agent{
            ID:        uuid.New(),
            Hostname:  "test-host",
            OSType:    "linux",
            Version:   "0.1.23",
            CreatedAt: time.Now(),
        }

        // Create agent
        err := queries.CreateAgent(agent)
        assert.NoError(t, err)

        // Retrieve agent
        retrieved, err := queries.GetAgent(agent.ID)
        assert.NoError(t, err)
        assert.Equal(t, agent.Hostname, retrieved.Hostname)
        assert.Equal(t, agent.OSType, retrieved.OSType)
    })
}
```

### 4. Integration Tests
```go
// integration/agent_server_test.go
func TestAgentServerIntegration(t *testing.T) {
    if testing.Short() {
        t.Skip("Skipping integration test in short mode")
    }

    // Setup test environment
    server := setupIntegrationServer(t)
    defer server.Cleanup()

    agent := setupIntegrationAgent(t, server.URL)
    defer agent.Cleanup()

    t.Run("Complete agent lifecycle", func(t *testing.T) {
        // Registration
        err := agent.Register()
        assert.NoError(t, err)

        // First check-in (no commands)
        commands, err := agent.CheckIn()
        assert.NoError(t, err)
        assert.Empty(t, commands)

        // Send scan command
        scanCmd := &Command{
            Type: "scan_updates",
            ID:   uuid.New(),
        }
        err = server.SendCommand(agent.ID, scanCmd)
        assert.NoError(t, err)

        // Second check-in (should receive command)
        commands, err = agent.CheckIn()
        assert.NoError(t, err)
        assert.Len(t, commands, 1)
        assert.Equal(t, "scan_updates", commands[0].Type)

        // Execute command and report results
        result := agent.ExecuteCommand(commands[0])
        err = agent.ReportResult(result)
        assert.NoError(t, err)

        // Verify command completion
        cmdStatus, err := server.GetCommandStatus(scanCmd.ID)
        assert.NoError(t, err)
        assert.Equal(t, "completed", cmdStatus.Status)
    })
}

// integration/security_test.go
func TestSecurityFeatures(t *testing.T) {
    server := setupIntegrationServer(t)
    defer server.Cleanup()

    t.Run("Machine binding enforcement", func(t *testing.T) {
        agent1 := setupIntegrationAgent(t, server.URL)
        agent2 := setupIntegrationAgentWithMachineID(t, server.URL, agent1.MachineID)

        // Register first agent
        err := agent1.Register()
        assert.NoError(t, err)

        // Attempt to register second agent with same machine ID
        err = agent2.Register()
        assert.Error(t, err)
        assert.Contains(t, err.Error(), "machine ID already registered")
    })

    t.Run("Ed25519 signature validation", func(t *testing.T) {
        // Test with valid signature
        validPackage := createSignedPackage(t, server.PrivateKey)
        err := agent.VerifyPackageSignature(validPackage)
        assert.NoError(t, err)

        // Test with invalid signature
        invalidPackage := createSignedPackage(t, "wrong-key")
        err = agent.VerifyPackageSignature(invalidPackage)
        assert.Error(t, err)
        assert.Contains(t, err.Error(), "invalid signature")
    })
}
```

### 5. Performance Tests
```go
// performance/load_test.go
func BenchmarkAgentCheckIn(b *testing.B) {
    server := setupBenchmarkServer(b)
    defer server.Cleanup()

    agent := setupBenchmarkAgent(b, server.URL)

    b.ResetTimer()
    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            _, err := agent.CheckIn()
            if err != nil {
                b.Fatal(err)
            }
        }
    })
}

func TestConcurrentAgents(t *testing.T) {
    server := setupIntegrationServer(t)
    defer server.Cleanup()

    numAgents := 100
    var wg sync.WaitGroup
    errors := make(chan error, numAgents)

    for i := 0; i < numAgents; i++ {
        wg.Add(1)
        go func(id int) {
            defer wg.Done()

            agent := setupIntegrationAgentWithID(t, server.URL, fmt.Sprintf("agent-%d", id))
            err := agent.Register()
            if err != nil {
                errors <- fmt.Errorf("agent %d registration failed: %w", id, err)
                return
            }

            // Perform several check-ins
            for j := 0; j < 5; j++ {
                _, err := agent.CheckIn()
                if err != nil {
                    errors <- fmt.Errorf("agent %d check-in %d failed: %w", id, j, err)
                    return
                }
            }
        }(i)
    }

    wg.Wait()
    close(errors)

    // Check for any errors
    for err := range errors {
        t.Error(err)
    }
}
```

### 6. Test Database Setup
```go
// internal/testutil/db.go
package testutil

import (
    "database/sql"
    "fmt"
    "os"
)

func setupTestDB(t *testing.T) *sql.DB {
    db, err := sql.Open("postgres", "postgres://test:test@localhost/redflag_test?sslmode=disable")
    if err != nil {
        t.Fatalf("Failed to connect to test database: %v", err)
    }

    // Run migrations
    if err := runMigrations(db); err != nil {
        t.Fatalf("Failed to run migrations: %v", err)
    }

    return db
}

func resetTestDB(db *sql.DB) error {
    tables := []string{
        "agent_commands", "update_logs", "registration_token_usage",
        "registration_tokens", "refresh_tokens", "agents",
    }

    tx, err := db.Begin()
    if err != nil {
        return err
    }
    defer tx.Rollback()

    for _, table := range tables {
        _, err := tx.Exec(fmt.Sprintf("DELETE FROM %s", table))
        if err != nil {
            return err
        }
    }

    return tx.Commit()
}
```

## Definition of Done

- [ ] Unit test coverage >80% for all critical components
- [ ] Integration test coverage for all major workflows
- [ ] Performance tests for scalability validation
- [ ] Security tests for authentication and cryptographic features
- [ ] CI/CD pipeline with automated testing
- [ ] Test database setup and migration testing
- [ ] Mock implementations for external dependencies
- [ ] Test documentation and examples

## Implementation Plan

### Phase 1: Foundation (Week 1)
- Set up testing framework and utilities
- Create test database setup
- Implement mock objects for external dependencies
- Add basic unit tests for core components

### Phase 2: Agent Testing (Week 2)
- Scanner implementation tests
- Agent lifecycle tests
- Error handling and resilience tests
- Cross-platform compatibility tests

### Phase 3: Server Testing (Week 3)
- API endpoint tests
- Database operation tests
- Authentication and security tests
- Rate limiting and middleware tests

### Phase 4: Integration & Performance (Week 4)
- End-to-end integration tests
- Multi-agent scenarios
- Performance and load tests
- Security validation tests

## Testing Strategy

### Unit Tests
- Focus on individual component behavior
- Mock external dependencies
- Fast execution (<1 second per test)
- Cover edge cases and error conditions

### Integration Tests
- Test component interactions
- Use real database and filesystem
- Slower execution but comprehensive coverage
- Validate complete workflows

### Performance Tests
- Measure response times and throughput
- Test under realistic load conditions
- Identify performance bottlenecks
- Validate scalability claims

### Security Tests
- Validate authentication mechanisms
- Test cryptographic operations
- Verify input validation
- Check for common vulnerabilities

## Prerequisites

- Test database instance (PostgreSQL)
- CI/CD pipeline infrastructure
- Mock implementations for external services
- Performance testing environment
- Security testing tools and knowledge

## Effort Estimate

**Complexity:** High
**Effort:** 4 weeks (1 developer)
- Week 1: Testing framework and foundation
- Week 2: Agent component tests
- Week 3: Server component tests
- Week 4: Integration and performance tests

## Success Metrics

- Code coverage >80% for critical components
- All major workflows covered by integration tests
- Performance tests validate 10,000+ agent support
- Security tests verify authentication and cryptography
- CI/CD pipeline runs tests automatically
- Regression detection for new features
- Documentation includes testing guidelines

## Monitoring

Track these metrics after implementation:
- Test execution time trends
- Code coverage percentage
- Test failure rates
- Performance benchmark results
- Security test findings
- Developer satisfaction with testing tools