Add docs and project files - force for Culurien
This commit is contained in:
378
docs/3_BACKLOG/P4-003_Agent-File-Management-Migration.md
Normal file
378
docs/3_BACKLOG/P4-003_Agent-File-Management-Migration.md
Normal file
@@ -0,0 +1,378 @@
|
||||
# P4-003: Agent File Management and Migration System
|
||||
|
||||
**Priority:** P4 (Technical Debt)
|
||||
**Source Reference:** From analysis of needsfixingbeforepush.md lines 1477-1517 and DEVELOPMENT_TODOS.md lines 1611-1635
|
||||
**Date Identified:** 2025-11-12
|
||||
|
||||
## Problem Description
|
||||
|
||||
Agent has no validation that working files belong to current agent binary/version. Stale files from previous agent installations interfere with current operations, causing timeout issues and data corruption. Mixed directory naming creates confusion and maintenance issues.
|
||||
|
||||
## Impact
|
||||
|
||||
- **Data Corruption:** Stale `last_scan.json` files with wrong agent IDs cause parsing timeouts
|
||||
- **Installation Conflicts:** No clean migration between agent versions
|
||||
- **Path Inconsistency:** Mixed `/var/lib/aggregator` vs `/var/lib/redflag` paths
|
||||
- **Security Risk:** No file validation prevents potential file poisoning attacks
|
||||
- **Maintenance Burden:** Manual cleanup required for corrupted files
|
||||
|
||||
## Current Issues Identified
|
||||
|
||||
### 1. Stale File Problem
|
||||
```json
|
||||
// /var/lib/aggregator/last_scan.json from October 14th
|
||||
{
|
||||
"last_scan_time": "2025-10-14T10:19:23.20489739-04:00", // OLD!
|
||||
"agent_id": "49f9a1e8-66db-4d21-b3f4-f416e0523ed1", // OLD!
|
||||
"updates": [/* 50,000+ lines causing timeouts */]
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Path Inconsistency
|
||||
- Old paths: `/var/lib/aggregator`, `/etc/aggregator`
|
||||
- New paths: `/var/lib/redflag`, `/etc/redflag`
|
||||
- Mixed usage across codebase
|
||||
- No standardized migration strategy
|
||||
|
||||
### 3. No Version Validation
|
||||
- Agent doesn't validate file ownership
|
||||
- No binary signature validation of working files
|
||||
- Stale files accumulate and cause issues
|
||||
- No cleanup mechanisms
|
||||
|
||||
## Proposed Solution
|
||||
|
||||
Implement comprehensive file management and migration system:
|
||||
|
||||
### 1. File Validation and Migration System
|
||||
```go
|
||||
type FileManager struct {
|
||||
CurrentAgentID string
|
||||
CurrentVersion string
|
||||
BasePaths PathConfig
|
||||
MigrationConfig MigrationConfig
|
||||
}
|
||||
|
||||
type PathConfig struct {
|
||||
Config string // /etc/redflag/config.json
|
||||
State string // /var/lib/redflag/
|
||||
Backup string // /var/lib/redflag/backups/
|
||||
Logs string // /var/log/redflag/
|
||||
}
|
||||
|
||||
type MigrationConfig struct {
|
||||
OldPaths []string // Legacy paths to migrate from
|
||||
BackupEnabled bool
|
||||
MaxBackups int
|
||||
}
|
||||
|
||||
func (fm *FileManager) ValidateAndMigrate() error {
|
||||
// 1. Check for legacy paths and migrate
|
||||
if err := fm.migrateLegacyPaths(); err != nil {
|
||||
return fmt.Errorf("path migration failed: %w", err)
|
||||
}
|
||||
|
||||
// 2. Validate file ownership
|
||||
if err := fm.validateFileOwnership(); err != nil {
|
||||
return fmt.Errorf("file ownership validation failed: %w", err)
|
||||
}
|
||||
|
||||
// 3. Clean up stale files
|
||||
if err := fm.cleanupStaleFiles(); err != nil {
|
||||
return fmt.Errorf("stale file cleanup failed: %w", err)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Agent File Ownership Validation
|
||||
```go
|
||||
type FileMetadata struct {
|
||||
AgentID string `json:"agent_id"`
|
||||
Version string `json:"version"`
|
||||
CreatedAt time.Time `json:"created_at"`
|
||||
UpdatedAt time.Time `json:"updated_at"`
|
||||
Checksum string `json:"checksum"`
|
||||
}
|
||||
|
||||
func (fm *FileManager) ValidateFile(filePath string) error {
|
||||
// Check if file exists
|
||||
if _, err := os.Stat(filePath); os.IsNotExist(err) {
|
||||
return nil // No file to validate
|
||||
}
|
||||
|
||||
// Read file metadata
|
||||
metadata, err := fm.readFileMetadata(filePath)
|
||||
if err != nil {
|
||||
// No metadata found - treat as legacy file
|
||||
return fm.handleLegacyFile(filePath)
|
||||
}
|
||||
|
||||
// Validate agent ID matches
|
||||
if metadata.AgentID != fm.CurrentAgentID {
|
||||
return fm.handleMismatchedFile(filePath, metadata)
|
||||
}
|
||||
|
||||
// Validate version compatibility
|
||||
if !fm.isVersionCompatible(metadata.Version) {
|
||||
return fm.handleVersionMismatch(filePath, metadata)
|
||||
}
|
||||
|
||||
// Validate file integrity
|
||||
if err := fm.validateFileIntegrity(filePath, metadata.Checksum); err != nil {
|
||||
return fmt.Errorf("file integrity check failed for %s: %w", filePath, err)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Stale File Detection and Cleanup
|
||||
```go
|
||||
func (fm *FileManager) cleanupStaleFiles() error {
|
||||
files := []string{
|
||||
filepath.Join(fm.BasePaths.State, "last_scan.json"),
|
||||
filepath.Join(fm.BasePaths.State, "pending_acks.json"),
|
||||
filepath.Join(fm.BasePaths.State, "command_history.json"),
|
||||
}
|
||||
|
||||
for _, file := range files {
|
||||
if err := fm.ValidateFile(file); err != nil {
|
||||
if isStaleFileError(err) {
|
||||
// Backup and remove stale file
|
||||
if err := fm.backupAndRemove(file); err != nil {
|
||||
log.Printf("Warning: Failed to backup stale file %s: %v", file, err)
|
||||
} else {
|
||||
log.Printf("Cleaned up stale file: %s", file)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
func (fm *FileManager) backupAndRemove(filePath string) error {
|
||||
if !fm.MigrationConfig.BackupEnabled {
|
||||
return os.Remove(filePath)
|
||||
}
|
||||
|
||||
// Create backup with timestamp
|
||||
timestamp := time.Now().Format("20060102-150405")
|
||||
backupPath := filepath.Join(fm.BasePaths.Backup, fmt.Sprintf("%s.%s", filepath.Base(filePath), timestamp))
|
||||
|
||||
// Ensure backup directory exists
|
||||
if err := os.MkdirAll(fm.BasePaths.Backup, 0755); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
// Copy to backup
|
||||
if err := copyFile(filePath, backupPath); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
// Remove original
|
||||
return os.Remove(filePath)
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Path Standardization
|
||||
```go
|
||||
// Standardized paths for consistency
|
||||
const (
|
||||
DefaultConfigPath = "/etc/redflag/config.json"
|
||||
DefaultStatePath = "/var/lib/redflag/"
|
||||
DefaultBackupPath = "/var/lib/redflag/backups/"
|
||||
DefaultLogPath = "/var/log/redflag/"
|
||||
)
|
||||
|
||||
func GetStandardPaths() PathConfig {
|
||||
return PathConfig{
|
||||
Config: DefaultConfigPath,
|
||||
State: DefaultStatePath,
|
||||
Backup: DefaultBackupPath,
|
||||
Logs: DefaultLogPath,
|
||||
}
|
||||
}
|
||||
|
||||
func (fm *FileManager) migrateLegacyPaths() error {
|
||||
legacyPaths := []string{
|
||||
"/etc/aggregator",
|
||||
"/var/lib/aggregator",
|
||||
}
|
||||
|
||||
for _, legacyPath := range legacyPaths {
|
||||
if _, err := os.Stat(legacyPath); err == nil {
|
||||
if err := fm.migrateFromPath(legacyPath); err != nil {
|
||||
return fmt.Errorf("failed to migrate from %s: %w", legacyPath, err)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
```
|
||||
|
||||
### 5. Binary Signature Validation
|
||||
```go
|
||||
func (fm *FileManager) validateBinarySignature(filePath string) error {
|
||||
// Get current binary signature
|
||||
currentBinary, err := os.Executable()
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
currentSignature, err := fm.calculateFileSignature(currentBinary)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
// Read file's expected binary signature
|
||||
metadata, err := fm.readFileMetadata(filePath)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
if metadata.BinarySignature != "" && metadata.BinarySignature != currentSignature {
|
||||
return fmt.Errorf("file was created by different binary version")
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
```
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] File validation system checks agent ID and version compatibility
|
||||
- [ ] Automatic cleanup of stale files from previous installations
|
||||
- [ ] Path standardization implemented across codebase
|
||||
- [ ] Migration system handles legacy path transitions
|
||||
- [ ] Backup system preserves important files during cleanup
|
||||
- [ ] Binary signature validation prevents file poisoning
|
||||
- [ ] Configuration options for migration behavior
|
||||
- [ ] Comprehensive logging for debugging file issues
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### File Locations
|
||||
- **Primary:** `aggregator-agent/internal/filesystem/` (new package)
|
||||
- **Integration:** `aggregator-agent/cmd/agent/main.go` (initialization)
|
||||
- **Config:** `aggregator-agent/internal/config/config.go`
|
||||
|
||||
### Configuration Options
|
||||
```json
|
||||
{
|
||||
"file_management": {
|
||||
"paths": {
|
||||
"config": "/etc/redflag/config.json",
|
||||
"state": "/var/lib/redflag/",
|
||||
"backup": "/var/lib/redflag/backups/",
|
||||
"logs": "/var/log/redflag/"
|
||||
},
|
||||
"migration": {
|
||||
"cleanup_stale_files": true,
|
||||
"backup_on_cleanup": true,
|
||||
"max_backups": 10,
|
||||
"migrate_legacy_paths": true
|
||||
},
|
||||
"validation": {
|
||||
"validate_agent_id": true,
|
||||
"validate_version": true,
|
||||
"validate_binary_signature": false
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Integration Points
|
||||
```go
|
||||
// Agent initialization
|
||||
func (a *Agent) initialize() error {
|
||||
// Existing initialization...
|
||||
|
||||
// File management setup
|
||||
fileManager := filesystem.NewFileManager(a.config, a.agentID, AgentVersion)
|
||||
if err := fileManager.ValidateAndMigrate(); err != nil {
|
||||
return fmt.Errorf("file management initialization failed: %w", err)
|
||||
}
|
||||
|
||||
a.fileManager = fileManager
|
||||
return nil
|
||||
}
|
||||
|
||||
// Before scan operations
|
||||
func (a *Agent) scanForUpdates() error {
|
||||
// Validate files before operation
|
||||
if err := a.fileManager.ValidateAndMigrate(); err != nil {
|
||||
log.Printf("Warning: File validation failed, proceeding anyway: %v", err)
|
||||
}
|
||||
|
||||
// Continue with scan...
|
||||
}
|
||||
```
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Unit Tests
|
||||
- File validation logic
|
||||
- Migration path handling
|
||||
- Backup and cleanup operations
|
||||
- Signature validation
|
||||
|
||||
### Integration Tests
|
||||
- Full migration scenarios
|
||||
- Stale file detection
|
||||
- Path transition testing
|
||||
- Configuration validation
|
||||
|
||||
### Manual Test Scenarios
|
||||
1. **Stale File Cleanup:**
|
||||
- Install agent v1, create state files
|
||||
- Install agent v2 with different agent ID
|
||||
- Verify stale files are backed up and cleaned
|
||||
|
||||
2. **Path Migration:**
|
||||
- Install agent with old paths
|
||||
- Upgrade to new version
|
||||
- Verify files are moved to new locations
|
||||
|
||||
3. **File Corruption Recovery:**
|
||||
- Corrupt state files manually
|
||||
- Restart agent
|
||||
- Verify recovery or graceful degradation
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Configuration system supports nested structures
|
||||
- Logging infrastructure supports structured output
|
||||
- Agent has unique ID and version information
|
||||
- File system permissions allow access to required paths
|
||||
|
||||
## Effort Estimate
|
||||
|
||||
**Complexity:** Medium-High
|
||||
**Effort:** 3-4 days
|
||||
- Day 1: File validation and cleanup system
|
||||
- Day 2: Path migration and standardization
|
||||
- Day 3: Binary signature validation
|
||||
- Day 4: Integration testing and configuration
|
||||
|
||||
## Success Metrics
|
||||
|
||||
- Elimination of timeout issues from stale files
|
||||
- Zero manual intervention required for upgrades
|
||||
- Consistent path usage across codebase
|
||||
- No data loss during migration operations
|
||||
- Improved system startup reliability
|
||||
- Enhanced security through file validation
|
||||
|
||||
## Monitoring
|
||||
|
||||
Track these metrics after implementation:
|
||||
- File validation error rate
|
||||
- Migration success rate
|
||||
- Stale file cleanup frequency
|
||||
- Path standardization compliance
|
||||
- Agent startup time improvement
|
||||
- User-reported file issues reduction
|
||||
Reference in New Issue
Block a user