Files
Redflag/docs/3_BACKLOG/P3-001_Duplicate-Command-Prevention.md

6.4 KiB

Duplicate Command Prevention System

Priority: P3 (Enhancement) Source Reference: From quick-todos.md line 21 Status: Analyzed, Ready for Implementation

Problem Statement

The current command scheduling system has no duplicate detection mechanism. Multiple instances of the same command can be queued for an agent (e.g., multiple scan_apt commands), causing unnecessary work, potential conflicts, and wasted system resources.

Feature Description

Implement duplicate command prevention logic that checks for existing pending/sent commands of the same type before creating new ones, while preserving legitimate retry and interval scheduling behavior.

Acceptance Criteria

  1. System checks for recent duplicate commands before creating new ones
  2. Uses AgentID + CommandType + Status IN ('pending', 'sent') as duplicate criteria
  3. Time-based window to allow legitimate repeats (e.g., 5 minutes)
  4. Skip duplicates only if recent (configurable timeframe)
  5. Preserve legitimate scheduling and retry logic
  6. Logging of duplicate prevention for monitoring
  7. Manual commands can override duplicate prevention

Technical Approach

1. Database Query Layer

New Query Function (aggregator-server/internal/database/queries/):

-- Check for recent duplicate commands
SELECT COUNT(*) FROM commands
WHERE agent_id = $1
  AND command_type = $2
  AND status IN ('pending', 'sent')
  AND created_at > NOW() - INTERVAL '5 minutes';

Go Implementation:

func (q *Queries) CheckRecentDuplicate(agentID uuid.UUID, commandType string, timeWindow time.Duration) (bool, error) {
    var count int
    err := q.db.QueryRow(`
        SELECT COUNT(*) FROM commands
        WHERE agent_id = $1
          AND command_type = $2
          AND status IN ('pending', 'sent')
          AND created_at > NOW() - $3::INTERVAL
    `, agentID, commandType, timeWindow).Scan(&count)
    return count > 0, err
}

2. Scheduler Integration

Enhanced Command Creation (aggregator-server/internal/services/scheduler.go):

func (s *Scheduler) CreateCommandWithDuplicateCheck(agentID uuid.UUID, commandType string, payload interface{}, force bool) error {
    // Skip duplicate check for forced commands
    if !force {
        isDuplicate, err := s.queries.CheckRecentDuplicate(agentID, commandType, 5*time.Minute)
        if err != nil {
            return fmt.Errorf("failed to check for duplicates: %w", err)
        }
        if isDuplicate {
            log.Printf("Skipping duplicate %s command for agent %s (created within 5 minutes)", commandType, agentID)
            return nil
        }
    }

    // Create command normally
    return s.queries.CreateCommand(agentID, commandType, payload)
}

3. Configuration

Duplicate Prevention Settings:

  • Time window: 5 minutes (configurable via environment)
  • Command types to check: scan_apt, scan_dnf, scan_updates, etc.
  • Manual command override: Force flag to bypass duplicate check
  • Logging level: Debug vs Info for duplicate skips

4. Monitoring and Logging

Duplicate Prevention Metrics:

  • Counter for duplicates prevented per command type
  • Logging of duplicate prevention with agent and command details
  • Dashboard metrics showing duplicate prevention effectiveness

Definition of Done

  • Database query for duplicate detection implemented
  • Scheduler integrates duplicate checking before command creation
  • Configurable time window for duplicate detection
  • Manual commands can bypass duplicate prevention
  • Proper logging and monitoring of duplicate prevention
  • Unit tests for various duplicate scenarios
  • Integration testing with scheduler behavior
  • Performance impact assessment (minimal overhead)

Test Plan

  1. Unit Tests

    • Test duplicate detection with various time windows
    • Test command type filtering
    • Test agent-specific duplicate checking
    • Test force override functionality
  2. Integration Tests

    • Test scheduler behavior with duplicate prevention
    • Test legitimate retry scenarios still work
    • Test manual command override
    • Test performance impact under load
  3. Scenario Tests

    • Multiple rapid scan_apt commands for same agent
    • Different command types for same agent (should not duplicate)
    • Same command type for different agents (should not duplicate)
    • Commands older than time window (should create new command)

Files to Modify

  • aggregator-server/internal/database/queries/commands.go - Add duplicate check query
  • aggregator-server/internal/services/scheduler.go - Integrate duplicate checking
  • aggregator-server/cmd/server/main.go - Configuration for time window
  • aggregator-server/internal/services/metrics.go - Add duplicate prevention metrics

Duplicate Detection Logic

Criteria for Duplicate

  1. Same Agent ID: Commands for different agents are not duplicates
  2. Same Command Type: scan_apt vs scan_dnf are different commands
  3. Recent Creation: Within configured time window (default 5 minutes)
  4. Active Status: Only 'pending' or 'sent' commands count as duplicates

Time Window Considerations

  • 5 minutes: Prevents rapid-fire duplicate scheduling
  • Configurable: Can be adjusted per deployment needs
  • Per Command Type: Different windows for different command types

Override Mechanisms

  1. Manual Commands: Admin-initiated commands can force execution
  2. Critical Commands: Security or emergency updates bypass duplicate prevention
  3. Different Payloads: Commands with different parameters may not be duplicates

Estimated Effort

  • Development: 6-8 hours
  • Testing: 4-6 hours
  • Review: 2-3 hours

Dependencies

  • Existing command queue system
  • Scheduler service architecture
  • Database query layer

Risk Assessment

Low Risk - Enhancement that doesn't change existing functionality, only adds prevention logic. The force override provides safety valve for edge cases. Configurable time window allows tuning based on operational needs.

Performance Impact

  • Database Overhead: One additional query per command creation (minimal)
  • Memory Impact: Negligible
  • Network Impact: None
  • CPU Impact: Minimal (simple query with indexed columns)

Monitoring Metrics

  • Duplicates prevented per hour/day
  • Command creation success rate
  • Average time between duplicate attempts
  • Most frequent duplicate command types
  • Agent-specific duplicate patterns