Files
Redflag/docs/3_BACKLOG/P3-001_Duplicate-Command-Prevention.md

176 lines
6.4 KiB
Markdown

# Duplicate Command Prevention System
**Priority**: P3 (Enhancement)
**Source Reference**: From quick-todos.md line 21
**Status**: Analyzed, Ready for Implementation
## Problem Statement
The current command scheduling system has no duplicate detection mechanism. Multiple instances of the same command can be queued for an agent (e.g., multiple `scan_apt` commands), causing unnecessary work, potential conflicts, and wasted system resources.
## Feature Description
Implement duplicate command prevention logic that checks for existing pending/sent commands of the same type before creating new ones, while preserving legitimate retry and interval scheduling behavior.
## Acceptance Criteria
1. System checks for recent duplicate commands before creating new ones
2. Uses `AgentID` + `CommandType` + `Status IN ('pending', 'sent')` as duplicate criteria
3. Time-based window to allow legitimate repeats (e.g., 5 minutes)
4. Skip duplicates only if recent (configurable timeframe)
5. Preserve legitimate scheduling and retry logic
6. Logging of duplicate prevention for monitoring
7. Manual commands can override duplicate prevention
## Technical Approach
### 1. Database Query Layer
**New Query Function** (`aggregator-server/internal/database/queries/`):
```sql
-- Check for recent duplicate commands
SELECT COUNT(*) FROM commands
WHERE agent_id = $1
AND command_type = $2
AND status IN ('pending', 'sent')
AND created_at > NOW() - INTERVAL '5 minutes';
```
**Go Implementation**:
```go
func (q *Queries) CheckRecentDuplicate(agentID uuid.UUID, commandType string, timeWindow time.Duration) (bool, error) {
var count int
err := q.db.QueryRow(`
SELECT COUNT(*) FROM commands
WHERE agent_id = $1
AND command_type = $2
AND status IN ('pending', 'sent')
AND created_at > NOW() - $3::INTERVAL
`, agentID, commandType, timeWindow).Scan(&count)
return count > 0, err
}
```
### 2. Scheduler Integration
**Enhanced Command Creation** (`aggregator-server/internal/services/scheduler.go`):
```go
func (s *Scheduler) CreateCommandWithDuplicateCheck(agentID uuid.UUID, commandType string, payload interface{}, force bool) error {
// Skip duplicate check for forced commands
if !force {
isDuplicate, err := s.queries.CheckRecentDuplicate(agentID, commandType, 5*time.Minute)
if err != nil {
return fmt.Errorf("failed to check for duplicates: %w", err)
}
if isDuplicate {
log.Printf("Skipping duplicate %s command for agent %s (created within 5 minutes)", commandType, agentID)
return nil
}
}
// Create command normally
return s.queries.CreateCommand(agentID, commandType, payload)
}
```
### 3. Configuration
**Duplicate Prevention Settings**:
- Time window: 5 minutes (configurable via environment)
- Command types to check: `scan_apt`, `scan_dnf`, `scan_updates`, etc.
- Manual command override: Force flag to bypass duplicate check
- Logging level: Debug vs Info for duplicate skips
### 4. Monitoring and Logging
**Duplicate Prevention Metrics**:
- Counter for duplicates prevented per command type
- Logging of duplicate prevention with agent and command details
- Dashboard metrics showing duplicate prevention effectiveness
## Definition of Done
- ✅ Database query for duplicate detection implemented
- ✅ Scheduler integrates duplicate checking before command creation
- ✅ Configurable time window for duplicate detection
- ✅ Manual commands can bypass duplicate prevention
- ✅ Proper logging and monitoring of duplicate prevention
- ✅ Unit tests for various duplicate scenarios
- ✅ Integration testing with scheduler behavior
- ✅ Performance impact assessment (minimal overhead)
## Test Plan
1. **Unit Tests**
- Test duplicate detection with various time windows
- Test command type filtering
- Test agent-specific duplicate checking
- Test force override functionality
2. **Integration Tests**
- Test scheduler behavior with duplicate prevention
- Test legitimate retry scenarios still work
- Test manual command override
- Test performance impact under load
3. **Scenario Tests**
- Multiple rapid `scan_apt` commands for same agent
- Different command types for same agent (should not duplicate)
- Same command type for different agents (should not duplicate)
- Commands older than time window (should create new command)
## Files to Modify
- `aggregator-server/internal/database/queries/commands.go` - Add duplicate check query
- `aggregator-server/internal/services/scheduler.go` - Integrate duplicate checking
- `aggregator-server/cmd/server/main.go` - Configuration for time window
- `aggregator-server/internal/services/metrics.go` - Add duplicate prevention metrics
## Duplicate Detection Logic
### Criteria for Duplicate
1. **Same Agent ID**: Commands for different agents are not duplicates
2. **Same Command Type**: `scan_apt` vs `scan_dnf` are different commands
3. **Recent Creation**: Within configured time window (default 5 minutes)
4. **Active Status**: Only 'pending' or 'sent' commands count as duplicates
### Time Window Considerations
- **5 minutes**: Prevents rapid-fire duplicate scheduling
- **Configurable**: Can be adjusted per deployment needs
- **Per Command Type**: Different windows for different command types
### Override Mechanisms
1. **Manual Commands**: Admin-initiated commands can force execution
2. **Critical Commands**: Security or emergency updates bypass duplicate prevention
3. **Different Payloads**: Commands with different parameters may not be duplicates
## Estimated Effort
- **Development**: 6-8 hours
- **Testing**: 4-6 hours
- **Review**: 2-3 hours
## Dependencies
- Existing command queue system
- Scheduler service architecture
- Database query layer
## Risk Assessment
**Low Risk** - Enhancement that doesn't change existing functionality, only adds prevention logic. The force override provides safety valve for edge cases. Configurable time window allows tuning based on operational needs.
## Performance Impact
- **Database Overhead**: One additional query per command creation (minimal)
- **Memory Impact**: Negligible
- **Network Impact**: None
- **CPU Impact**: Minimal (simple query with indexed columns)
## Monitoring Metrics
- Duplicates prevented per hour/day
- Command creation success rate
- Average time between duplicate attempts
- Most frequent duplicate command types
- Agent-specific duplicate patterns