Files
Redflag/docs/3_BACKLOG/P2-002_Migration-Error-Reporting.md

132 lines
5.3 KiB
Markdown

# Migration Error Reporting System
**Priority**: P2 (New Feature)
**Source Reference**: From DEVELOPMENT_TODOS.md line 348
**Status**: Ready for Implementation
## Problem Statement
When agent migration fails (either during detection or execution), there is currently no mechanism to report these failures to the server for visibility in the History table. Failed migrations are silently logged locally only, making it impossible to track migration issues across the agent fleet.
## Feature Description
Implement a migration error reporting system that sends migration failure information to the server for storage in the update_events table, enabling administrators to see migration status and troubleshoot issues through the web interface.
## Acceptance Criteria
1. Migration failures are reported to the server with detailed error information
2. Migration events appear in the agent History with appropriate severity levels
3. Both detection failures and execution failures are captured and reported
4. Error reports include context: migration type, error message, and system information
5. Server accepts migration events via existing agent check-in mechanism
6. Migration success/failure status is visible in the web interface
## Technical Approach
### 1. Agent-Side Changes
**Migration Event Structure** (`aggregator-agent/internal/migration/`):
```go
type MigrationEvent struct {
EventType string // "migration_detection" or "migration_execution"
Status string // "success", "failed", "warning"
ErrorMessage string // Detailed error message
MigrationFrom string // Source version/path
MigrationTo string // Target version/path
Timestamp time.Time
SystemInfo map[string]interface{}
}
```
**Enhanced Migration Logic**:
- Wrap migration detection and execution with error reporting
- Capture detailed error context and system information
- Queue migration events alongside regular update events
### 2. Server-Side Changes
**Database Schema** (if needed):
- Verify `update_events` table can handle migration event types
- Add migration-specific event types if not already supported
**API Handler** (`aggregator-server/internal/api/handlers/agent_updates.go`):
- Accept migration events in existing check-in endpoint
- Validate migration event structure
- Store events with appropriate metadata
**Event Processing**:
- Categorize migration events separately from regular updates
- Include migration-specific metadata in responses
### 3. Frontend Changes
**History Display** (`aggregator-web/src/components/AgentUpdate.tsx`):
- Show migration events with distinct styling
- Display migration status (success/failed/warning)
- Show detailed error messages in expandable sections
- Filter capability for migration-specific events
## Definition of Done
- ✅ Migration failures are captured and sent to server
- ✅ Migration events appear in agent History with proper categorization
- ✅ Error messages include sufficient detail for troubleshooting
- ✅ Migration success/failure status is clearly visible in UI
- ✅ Both detection and execution phases are monitored
- ✅ Integration testing validates end-to-end error reporting flow
## Test Plan
1. **Unit Tests**
- Test migration event creation and validation
- Test error message formatting and context capture
- Test server-side event acceptance and storage
2. **Integration Tests**
- Simulate migration detection failure with invalid config
- Simulate migration execution failure with permission issues
- Verify events appear in server database
- Test API response handling for migration events
3. **Manual Tests**
- Create agent with old config format requiring migration
- Force migration failure (e.g., permissions, disk space)
- Verify error appears in History within reasonable time
- Test error message clarity and usefulness
## Files to Modify
- `aggregator-agent/internal/migration/detection.go` - Add error reporting wrapper
- `aggregator-agent/internal/migration/executor.go` - Add error reporting wrapper
- `aggregator-agent/cmd/agent/main.go` - Handle migration event reporting
- `aggregator-server/internal/api/handlers/agent_updates.go` - Accept migration events
- `aggregator-web/src/components/AgentUpdate.tsx` - Display migration events
- `aggregator-web/src/components/AgentUpdatesEnhanced.tsx` - Enhanced display if used
## Migration Event Types
1. **Detection Events**:
- `migration_detection_success` - Detected need for migration
- `migration_detection_failed` - Error during migration detection
- `migration_detection_not_needed` - No migration required
2. **Execution Events**:
- `migration_execution_success` - Migration completed successfully
- `migration_execution_failed` - Migration failed with errors
- `migration_execution_partial` - Partial success with warnings
## Estimated Effort
- **Development**: 8-12 hours
- **Testing**: 4-6 hours
- **Review**: 2-3 hours
## Dependencies
- Existing agent update reporting infrastructure
- Current migration detection and execution systems
- Agent check-in mechanism for event transmission
## Risk Assessment
**Low Risk** - This feature enhances existing functionality without modifying core migration logic. The biggest risk is error message formatting, which can be easily adjusted based on testing feedback.