Files
Redflag/docs/3_BACKLOG/P4-005_Testing-Infrastructure-Gaps.md

567 lines
16 KiB
Markdown

# P4-005: Testing Infrastructure Gaps
**Priority:** P4 (Technical Debt)
**Source Reference:** From analysis of codebase testing coverage and existing test files
**Date Identified:** 2025-11-12
## Problem Description
RedFlag has minimal testing infrastructure with only 5 test files covering basic functionality. Critical components like agent communication, authentication, scanner integration, and database operations lack comprehensive test coverage. This creates high risk for regressions and makes confident deployment difficult.
## Current Test Coverage Analysis
### Existing Tests (5 files)
1. `aggregator-agent/internal/circuitbreaker/circuitbreaker_test.go` - Basic circuit breaker
2. `aggregator-agent/test_disk.go` - Disk detection testing (development)
3. `test_disk_detection.go` - Disk detection integration test
4. `aggregator-server/internal/scheduler/queue_test.go` - Queue operations (21 tests passing)
5. `aggregator-server/internal/scheduler/scheduler_test.go` - Scheduler logic (21 tests passing)
### Critical Missing Test Areas
#### Agent Components (0% coverage)
- Agent registration and authentication
- Scanner implementations (APT, DNF, Docker, Windows, Winget)
- Command execution and acknowledgment
- File management and state persistence
- Error handling and resilience
- Cross-platform compatibility
#### Server Components (Minimal coverage)
- API endpoints and handlers
- Database operations and queries
- Authentication and authorization
- Rate limiting and security middleware
- Agent lifecycle management
- Update package distribution
#### Integration Testing (0% coverage)
- End-to-end agent-server communication
- Multi-agent scenarios
- Error recovery and failover
- Performance under load
- Security validation (Ed25519, nonces, machine binding)
#### Security Testing (0% coverage)
- Cryptographic operations validation
- Authentication bypass attempts
- Input validation and sanitization
- Rate limiting effectiveness
- Machine binding enforcement
## Impact
- **Regression Risk:** No safety net for code changes
- **Deployment Confidence:** Cannot verify system reliability
- **Quality Assurance:** Manual testing is time-consuming and error-prone
- **Security Validation:** No automated security testing
- **Performance Testing:** No way to detect performance regressions
- **Documentation Gaps:** Tests serve as living documentation
## Proposed Solution
Implement comprehensive testing infrastructure across all components:
### 1. Unit Testing Framework
```go
// Test configuration and utilities
// aggregator/internal/testutil/testutil.go
package testutil
import (
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/mock"
"github.com/stretchr/testify/suite"
)
type TestSuite struct {
suite.Suite
DB *sql.DB
Config *Config
Server *httptest.Server
}
func (s *TestSuite) SetupSuite() {
// Initialize test database
s.DB = setupTestDB()
// Initialize test configuration
s.Config = &Config{
DatabaseURL: "postgres://test:test@localhost/redflag_test",
ServerPort: 0, // Random port for testing
}
}
func (s *TestSuite) TearDownSuite() {
if s.DB != nil {
s.DB.Close()
}
cleanupTestDB()
}
func (s *TestSuite) SetupTest() {
// Reset database state before each test
resetTestDB(s.DB)
}
// Mock implementations
type MockScanner struct {
mock.Mock
}
func (m *MockScanner) ScanForUpdates() ([]UpdateReportItem, error) {
args := m.Called()
return args.Get(0).([]UpdateReportItem), args.Error(1)
}
```
### 2. Agent Component Tests
```go
// aggregator-agent/cmd/agent/main_test.go
func TestAgentRegistration(t *testing.T) {
tests := []struct {
name string
token string
expectedStatus int
expectedError string
}{
{
name: "Valid registration",
token: "valid-token-123",
expectedStatus: http.StatusCreated,
},
{
name: "Invalid token",
token: "invalid-token",
expectedStatus: http.StatusUnauthorized,
expectedError: "invalid registration token",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
server := setupTestServer(t)
defer server.Close()
agent := &Agent{
ServerURL: server.URL,
Token: tt.token,
}
err := agent.Register()
if tt.expectedError != "" {
assert.Contains(t, err.Error(), tt.expectedError)
} else {
assert.NoError(t, err)
}
})
}
}
// aggregator-agent/internal/scanner/dnf_test.go
func TestDNFScanner(t *testing.T) {
// Test with mock dnf command
scanner := &DNFScanner{}
t.Run("Successful scan", func(t *testing.T) {
// Mock successful dnf check-update output
withMockCommand("dnf", "check-update", successfulDNFOutput, func() {
updates, err := scanner.ScanForUpdates()
assert.NoError(t, err)
assert.NotEmpty(t, updates)
// Verify update parsing
nginx := findUpdate(updates, "nginx")
assert.NotNil(t, nginx)
assert.Equal(t, "1.20.1", nginx.CurrentVersion)
assert.Equal(t, "1.21.0", nginx.AvailableVersion)
})
})
t.Run("DNF not available", func(t *testing.T) {
scanner.executable = "nonexistent-dnf"
_, err := scanner.ScanForUpdates()
assert.Error(t, err)
assert.Contains(t, err.Error(), "dnf not found")
})
}
```
### 3. Server Component Tests
```go
// aggregator-server/internal/api/handlers/agents_test.go
func TestAgentsHandler_RegisterAgent(t *testing.T) {
suite := &TestSuite{}
suite.SetupSuite()
defer suite.TearDownSuite()
tests := []struct {
name string
requestBody string
expectedStatus int
setupToken bool
}{
{
name: "Valid registration",
requestBody: `{"hostname":"test-host","os_type":"linux","agent_version":"0.1.23"}`,
setupToken: true,
expectedStatus: http.StatusCreated,
},
{
name: "Invalid JSON",
requestBody: `{"hostname":}`,
expectedStatus: http.StatusBadRequest,
},
{
name: "Missing token",
requestBody: `{"hostname":"test-host"}`,
expectedStatus: http.StatusUnauthorized,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
suite.SetupTest()
if tt.setupToken {
token := createTestToken(suite.DB, 5)
suite.Config.JWTSecret = "test-secret"
}
req := httptest.NewRequest("POST", "/api/v1/agents/register",
strings.NewReader(tt.requestBody))
req.Header.Set("Content-Type", "application/json")
if tt.setupToken {
req.Header.Set("Authorization", "Bearer test-token")
}
w := httptest.NewRecorder()
handler := NewAgentsHandler(suite.DB, suite.Config)
handler.RegisterAgent(w, req)
assert.Equal(t, tt.expectedStatus, w.Code)
})
}
}
// aggregator-server/internal/database/queries/agents_test.go
func TestAgentQueries(t *testing.T) {
db := setupTestDB(t)
queries := NewAgentQueries(db)
t.Run("Create and retrieve agent", func(t *testing.T) {
agent := &models.Agent{
ID: uuid.New(),
Hostname: "test-host",
OSType: "linux",
Version: "0.1.23",
CreatedAt: time.Now(),
}
// Create agent
err := queries.CreateAgent(agent)
assert.NoError(t, err)
// Retrieve agent
retrieved, err := queries.GetAgent(agent.ID)
assert.NoError(t, err)
assert.Equal(t, agent.Hostname, retrieved.Hostname)
assert.Equal(t, agent.OSType, retrieved.OSType)
})
}
```
### 4. Integration Tests
```go
// integration/agent_server_test.go
func TestAgentServerIntegration(t *testing.T) {
if testing.Short() {
t.Skip("Skipping integration test in short mode")
}
// Setup test environment
server := setupIntegrationServer(t)
defer server.Cleanup()
agent := setupIntegrationAgent(t, server.URL)
defer agent.Cleanup()
t.Run("Complete agent lifecycle", func(t *testing.T) {
// Registration
err := agent.Register()
assert.NoError(t, err)
// First check-in (no commands)
commands, err := agent.CheckIn()
assert.NoError(t, err)
assert.Empty(t, commands)
// Send scan command
scanCmd := &Command{
Type: "scan_updates",
ID: uuid.New(),
}
err = server.SendCommand(agent.ID, scanCmd)
assert.NoError(t, err)
// Second check-in (should receive command)
commands, err = agent.CheckIn()
assert.NoError(t, err)
assert.Len(t, commands, 1)
assert.Equal(t, "scan_updates", commands[0].Type)
// Execute command and report results
result := agent.ExecuteCommand(commands[0])
err = agent.ReportResult(result)
assert.NoError(t, err)
// Verify command completion
cmdStatus, err := server.GetCommandStatus(scanCmd.ID)
assert.NoError(t, err)
assert.Equal(t, "completed", cmdStatus.Status)
})
}
// integration/security_test.go
func TestSecurityFeatures(t *testing.T) {
server := setupIntegrationServer(t)
defer server.Cleanup()
t.Run("Machine binding enforcement", func(t *testing.T) {
agent1 := setupIntegrationAgent(t, server.URL)
agent2 := setupIntegrationAgentWithMachineID(t, server.URL, agent1.MachineID)
// Register first agent
err := agent1.Register()
assert.NoError(t, err)
// Attempt to register second agent with same machine ID
err = agent2.Register()
assert.Error(t, err)
assert.Contains(t, err.Error(), "machine ID already registered")
})
t.Run("Ed25519 signature validation", func(t *testing.T) {
// Test with valid signature
validPackage := createSignedPackage(t, server.PrivateKey)
err := agent.VerifyPackageSignature(validPackage)
assert.NoError(t, err)
// Test with invalid signature
invalidPackage := createSignedPackage(t, "wrong-key")
err = agent.VerifyPackageSignature(invalidPackage)
assert.Error(t, err)
assert.Contains(t, err.Error(), "invalid signature")
})
}
```
### 5. Performance Tests
```go
// performance/load_test.go
func BenchmarkAgentCheckIn(b *testing.B) {
server := setupBenchmarkServer(b)
defer server.Cleanup()
agent := setupBenchmarkAgent(b, server.URL)
b.ResetTimer()
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
_, err := agent.CheckIn()
if err != nil {
b.Fatal(err)
}
}
})
}
func TestConcurrentAgents(t *testing.T) {
server := setupIntegrationServer(t)
defer server.Cleanup()
numAgents := 100
var wg sync.WaitGroup
errors := make(chan error, numAgents)
for i := 0; i < numAgents; i++ {
wg.Add(1)
go func(id int) {
defer wg.Done()
agent := setupIntegrationAgentWithID(t, server.URL, fmt.Sprintf("agent-%d", id))
err := agent.Register()
if err != nil {
errors <- fmt.Errorf("agent %d registration failed: %w", id, err)
return
}
// Perform several check-ins
for j := 0; j < 5; j++ {
_, err := agent.CheckIn()
if err != nil {
errors <- fmt.Errorf("agent %d check-in %d failed: %w", id, j, err)
return
}
}
}(i)
}
wg.Wait()
close(errors)
// Check for any errors
for err := range errors {
t.Error(err)
}
}
```
### 6. Test Database Setup
```go
// internal/testutil/db.go
package testutil
import (
"database/sql"
"fmt"
"os"
)
func setupTestDB(t *testing.T) *sql.DB {
db, err := sql.Open("postgres", "postgres://test:test@localhost/redflag_test?sslmode=disable")
if err != nil {
t.Fatalf("Failed to connect to test database: %v", err)
}
// Run migrations
if err := runMigrations(db); err != nil {
t.Fatalf("Failed to run migrations: %v", err)
}
return db
}
func resetTestDB(db *sql.DB) error {
tables := []string{
"agent_commands", "update_logs", "registration_token_usage",
"registration_tokens", "refresh_tokens", "agents",
}
tx, err := db.Begin()
if err != nil {
return err
}
defer tx.Rollback()
for _, table := range tables {
_, err := tx.Exec(fmt.Sprintf("DELETE FROM %s", table))
if err != nil {
return err
}
}
return tx.Commit()
}
```
## Definition of Done
- [ ] Unit test coverage >80% for all critical components
- [ ] Integration test coverage for all major workflows
- [ ] Performance tests for scalability validation
- [ ] Security tests for authentication and cryptographic features
- [ ] CI/CD pipeline with automated testing
- [ ] Test database setup and migration testing
- [ ] Mock implementations for external dependencies
- [ ] Test documentation and examples
## Implementation Plan
### Phase 1: Foundation (Week 1)
- Set up testing framework and utilities
- Create test database setup
- Implement mock objects for external dependencies
- Add basic unit tests for core components
### Phase 2: Agent Testing (Week 2)
- Scanner implementation tests
- Agent lifecycle tests
- Error handling and resilience tests
- Cross-platform compatibility tests
### Phase 3: Server Testing (Week 3)
- API endpoint tests
- Database operation tests
- Authentication and security tests
- Rate limiting and middleware tests
### Phase 4: Integration & Performance (Week 4)
- End-to-end integration tests
- Multi-agent scenarios
- Performance and load tests
- Security validation tests
## Testing Strategy
### Unit Tests
- Focus on individual component behavior
- Mock external dependencies
- Fast execution (<1 second per test)
- Cover edge cases and error conditions
### Integration Tests
- Test component interactions
- Use real database and filesystem
- Slower execution but comprehensive coverage
- Validate complete workflows
### Performance Tests
- Measure response times and throughput
- Test under realistic load conditions
- Identify performance bottlenecks
- Validate scalability claims
### Security Tests
- Validate authentication mechanisms
- Test cryptographic operations
- Verify input validation
- Check for common vulnerabilities
## Prerequisites
- Test database instance (PostgreSQL)
- CI/CD pipeline infrastructure
- Mock implementations for external services
- Performance testing environment
- Security testing tools and knowledge
## Effort Estimate
**Complexity:** High
**Effort:** 4 weeks (1 developer)
- Week 1: Testing framework and foundation
- Week 2: Agent component tests
- Week 3: Server component tests
- Week 4: Integration and performance tests
## Success Metrics
- Code coverage >80% for critical components
- All major workflows covered by integration tests
- Performance tests validate 10,000+ agent support
- Security tests verify authentication and cryptography
- CI/CD pipeline runs tests automatically
- Regression detection for new features
- Documentation includes testing guidelines
## Monitoring
Track these metrics after implementation:
- Test execution time trends
- Code coverage percentage
- Test failure rates
- Performance benchmark results
- Security test findings
- Developer satisfaction with testing tools