176 lines
6.4 KiB
Markdown
176 lines
6.4 KiB
Markdown
# Duplicate Command Prevention System
|
|
|
|
**Priority**: P3 (Enhancement)
|
|
**Source Reference**: From quick-todos.md line 21
|
|
**Status**: Analyzed, Ready for Implementation
|
|
|
|
## Problem Statement
|
|
|
|
The current command scheduling system has no duplicate detection mechanism. Multiple instances of the same command can be queued for an agent (e.g., multiple `scan_apt` commands), causing unnecessary work, potential conflicts, and wasted system resources.
|
|
|
|
## Feature Description
|
|
|
|
Implement duplicate command prevention logic that checks for existing pending/sent commands of the same type before creating new ones, while preserving legitimate retry and interval scheduling behavior.
|
|
|
|
## Acceptance Criteria
|
|
|
|
1. System checks for recent duplicate commands before creating new ones
|
|
2. Uses `AgentID` + `CommandType` + `Status IN ('pending', 'sent')` as duplicate criteria
|
|
3. Time-based window to allow legitimate repeats (e.g., 5 minutes)
|
|
4. Skip duplicates only if recent (configurable timeframe)
|
|
5. Preserve legitimate scheduling and retry logic
|
|
6. Logging of duplicate prevention for monitoring
|
|
7. Manual commands can override duplicate prevention
|
|
|
|
## Technical Approach
|
|
|
|
### 1. Database Query Layer
|
|
|
|
**New Query Function** (`aggregator-server/internal/database/queries/`):
|
|
```sql
|
|
-- Check for recent duplicate commands
|
|
SELECT COUNT(*) FROM commands
|
|
WHERE agent_id = $1
|
|
AND command_type = $2
|
|
AND status IN ('pending', 'sent')
|
|
AND created_at > NOW() - INTERVAL '5 minutes';
|
|
```
|
|
|
|
**Go Implementation**:
|
|
```go
|
|
func (q *Queries) CheckRecentDuplicate(agentID uuid.UUID, commandType string, timeWindow time.Duration) (bool, error) {
|
|
var count int
|
|
err := q.db.QueryRow(`
|
|
SELECT COUNT(*) FROM commands
|
|
WHERE agent_id = $1
|
|
AND command_type = $2
|
|
AND status IN ('pending', 'sent')
|
|
AND created_at > NOW() - $3::INTERVAL
|
|
`, agentID, commandType, timeWindow).Scan(&count)
|
|
return count > 0, err
|
|
}
|
|
```
|
|
|
|
### 2. Scheduler Integration
|
|
|
|
**Enhanced Command Creation** (`aggregator-server/internal/services/scheduler.go`):
|
|
```go
|
|
func (s *Scheduler) CreateCommandWithDuplicateCheck(agentID uuid.UUID, commandType string, payload interface{}, force bool) error {
|
|
// Skip duplicate check for forced commands
|
|
if !force {
|
|
isDuplicate, err := s.queries.CheckRecentDuplicate(agentID, commandType, 5*time.Minute)
|
|
if err != nil {
|
|
return fmt.Errorf("failed to check for duplicates: %w", err)
|
|
}
|
|
if isDuplicate {
|
|
log.Printf("Skipping duplicate %s command for agent %s (created within 5 minutes)", commandType, agentID)
|
|
return nil
|
|
}
|
|
}
|
|
|
|
// Create command normally
|
|
return s.queries.CreateCommand(agentID, commandType, payload)
|
|
}
|
|
```
|
|
|
|
### 3. Configuration
|
|
|
|
**Duplicate Prevention Settings**:
|
|
- Time window: 5 minutes (configurable via environment)
|
|
- Command types to check: `scan_apt`, `scan_dnf`, `scan_updates`, etc.
|
|
- Manual command override: Force flag to bypass duplicate check
|
|
- Logging level: Debug vs Info for duplicate skips
|
|
|
|
### 4. Monitoring and Logging
|
|
|
|
**Duplicate Prevention Metrics**:
|
|
- Counter for duplicates prevented per command type
|
|
- Logging of duplicate prevention with agent and command details
|
|
- Dashboard metrics showing duplicate prevention effectiveness
|
|
|
|
## Definition of Done
|
|
|
|
- ✅ Database query for duplicate detection implemented
|
|
- ✅ Scheduler integrates duplicate checking before command creation
|
|
- ✅ Configurable time window for duplicate detection
|
|
- ✅ Manual commands can bypass duplicate prevention
|
|
- ✅ Proper logging and monitoring of duplicate prevention
|
|
- ✅ Unit tests for various duplicate scenarios
|
|
- ✅ Integration testing with scheduler behavior
|
|
- ✅ Performance impact assessment (minimal overhead)
|
|
|
|
## Test Plan
|
|
|
|
1. **Unit Tests**
|
|
- Test duplicate detection with various time windows
|
|
- Test command type filtering
|
|
- Test agent-specific duplicate checking
|
|
- Test force override functionality
|
|
|
|
2. **Integration Tests**
|
|
- Test scheduler behavior with duplicate prevention
|
|
- Test legitimate retry scenarios still work
|
|
- Test manual command override
|
|
- Test performance impact under load
|
|
|
|
3. **Scenario Tests**
|
|
- Multiple rapid `scan_apt` commands for same agent
|
|
- Different command types for same agent (should not duplicate)
|
|
- Same command type for different agents (should not duplicate)
|
|
- Commands older than time window (should create new command)
|
|
|
|
## Files to Modify
|
|
|
|
- `aggregator-server/internal/database/queries/commands.go` - Add duplicate check query
|
|
- `aggregator-server/internal/services/scheduler.go` - Integrate duplicate checking
|
|
- `aggregator-server/cmd/server/main.go` - Configuration for time window
|
|
- `aggregator-server/internal/services/metrics.go` - Add duplicate prevention metrics
|
|
|
|
## Duplicate Detection Logic
|
|
|
|
### Criteria for Duplicate
|
|
1. **Same Agent ID**: Commands for different agents are not duplicates
|
|
2. **Same Command Type**: `scan_apt` vs `scan_dnf` are different commands
|
|
3. **Recent Creation**: Within configured time window (default 5 minutes)
|
|
4. **Active Status**: Only 'pending' or 'sent' commands count as duplicates
|
|
|
|
### Time Window Considerations
|
|
- **5 minutes**: Prevents rapid-fire duplicate scheduling
|
|
- **Configurable**: Can be adjusted per deployment needs
|
|
- **Per Command Type**: Different windows for different command types
|
|
|
|
### Override Mechanisms
|
|
1. **Manual Commands**: Admin-initiated commands can force execution
|
|
2. **Critical Commands**: Security or emergency updates bypass duplicate prevention
|
|
3. **Different Payloads**: Commands with different parameters may not be duplicates
|
|
|
|
## Estimated Effort
|
|
|
|
- **Development**: 6-8 hours
|
|
- **Testing**: 4-6 hours
|
|
- **Review**: 2-3 hours
|
|
|
|
## Dependencies
|
|
|
|
- Existing command queue system
|
|
- Scheduler service architecture
|
|
- Database query layer
|
|
|
|
## Risk Assessment
|
|
|
|
**Low Risk** - Enhancement that doesn't change existing functionality, only adds prevention logic. The force override provides safety valve for edge cases. Configurable time window allows tuning based on operational needs.
|
|
|
|
## Performance Impact
|
|
|
|
- **Database Overhead**: One additional query per command creation (minimal)
|
|
- **Memory Impact**: Negligible
|
|
- **Network Impact**: None
|
|
- **CPU Impact**: Minimal (simple query with indexed columns)
|
|
|
|
## Monitoring Metrics
|
|
|
|
- Duplicates prevented per hour/day
|
|
- Command creation success rate
|
|
- Average time between duplicate attempts
|
|
- Most frequent duplicate command types
|
|
- Agent-specific duplicate patterns |