378 lines
11 KiB
Markdown
378 lines
11 KiB
Markdown
# P4-003: Agent File Management and Migration System
|
|
|
|
**Priority:** P4 (Technical Debt)
|
|
**Source Reference:** From analysis of needsfixingbeforepush.md lines 1477-1517 and DEVELOPMENT_TODOS.md lines 1611-1635
|
|
**Date Identified:** 2025-11-12
|
|
|
|
## Problem Description
|
|
|
|
Agent has no validation that working files belong to current agent binary/version. Stale files from previous agent installations interfere with current operations, causing timeout issues and data corruption. Mixed directory naming creates confusion and maintenance issues.
|
|
|
|
## Impact
|
|
|
|
- **Data Corruption:** Stale `last_scan.json` files with wrong agent IDs cause parsing timeouts
|
|
- **Installation Conflicts:** No clean migration between agent versions
|
|
- **Path Inconsistency:** Mixed `/var/lib/aggregator` vs `/var/lib/redflag` paths
|
|
- **Security Risk:** No file validation prevents potential file poisoning attacks
|
|
- **Maintenance Burden:** Manual cleanup required for corrupted files
|
|
|
|
## Current Issues Identified
|
|
|
|
### 1. Stale File Problem
|
|
```json
|
|
// /var/lib/aggregator/last_scan.json from October 14th
|
|
{
|
|
"last_scan_time": "2025-10-14T10:19:23.20489739-04:00", // OLD!
|
|
"agent_id": "49f9a1e8-66db-4d21-b3f4-f416e0523ed1", // OLD!
|
|
"updates": [/* 50,000+ lines causing timeouts */]
|
|
}
|
|
```
|
|
|
|
### 2. Path Inconsistency
|
|
- Old paths: `/var/lib/aggregator`, `/etc/aggregator`
|
|
- New paths: `/var/lib/redflag`, `/etc/redflag`
|
|
- Mixed usage across codebase
|
|
- No standardized migration strategy
|
|
|
|
### 3. No Version Validation
|
|
- Agent doesn't validate file ownership
|
|
- No binary signature validation of working files
|
|
- Stale files accumulate and cause issues
|
|
- No cleanup mechanisms
|
|
|
|
## Proposed Solution
|
|
|
|
Implement comprehensive file management and migration system:
|
|
|
|
### 1. File Validation and Migration System
|
|
```go
|
|
type FileManager struct {
|
|
CurrentAgentID string
|
|
CurrentVersion string
|
|
BasePaths PathConfig
|
|
MigrationConfig MigrationConfig
|
|
}
|
|
|
|
type PathConfig struct {
|
|
Config string // /etc/redflag/config.json
|
|
State string // /var/lib/redflag/
|
|
Backup string // /var/lib/redflag/backups/
|
|
Logs string // /var/log/redflag/
|
|
}
|
|
|
|
type MigrationConfig struct {
|
|
OldPaths []string // Legacy paths to migrate from
|
|
BackupEnabled bool
|
|
MaxBackups int
|
|
}
|
|
|
|
func (fm *FileManager) ValidateAndMigrate() error {
|
|
// 1. Check for legacy paths and migrate
|
|
if err := fm.migrateLegacyPaths(); err != nil {
|
|
return fmt.Errorf("path migration failed: %w", err)
|
|
}
|
|
|
|
// 2. Validate file ownership
|
|
if err := fm.validateFileOwnership(); err != nil {
|
|
return fmt.Errorf("file ownership validation failed: %w", err)
|
|
}
|
|
|
|
// 3. Clean up stale files
|
|
if err := fm.cleanupStaleFiles(); err != nil {
|
|
return fmt.Errorf("stale file cleanup failed: %w", err)
|
|
}
|
|
|
|
return nil
|
|
}
|
|
```
|
|
|
|
### 2. Agent File Ownership Validation
|
|
```go
|
|
type FileMetadata struct {
|
|
AgentID string `json:"agent_id"`
|
|
Version string `json:"version"`
|
|
CreatedAt time.Time `json:"created_at"`
|
|
UpdatedAt time.Time `json:"updated_at"`
|
|
Checksum string `json:"checksum"`
|
|
}
|
|
|
|
func (fm *FileManager) ValidateFile(filePath string) error {
|
|
// Check if file exists
|
|
if _, err := os.Stat(filePath); os.IsNotExist(err) {
|
|
return nil // No file to validate
|
|
}
|
|
|
|
// Read file metadata
|
|
metadata, err := fm.readFileMetadata(filePath)
|
|
if err != nil {
|
|
// No metadata found - treat as legacy file
|
|
return fm.handleLegacyFile(filePath)
|
|
}
|
|
|
|
// Validate agent ID matches
|
|
if metadata.AgentID != fm.CurrentAgentID {
|
|
return fm.handleMismatchedFile(filePath, metadata)
|
|
}
|
|
|
|
// Validate version compatibility
|
|
if !fm.isVersionCompatible(metadata.Version) {
|
|
return fm.handleVersionMismatch(filePath, metadata)
|
|
}
|
|
|
|
// Validate file integrity
|
|
if err := fm.validateFileIntegrity(filePath, metadata.Checksum); err != nil {
|
|
return fmt.Errorf("file integrity check failed for %s: %w", filePath, err)
|
|
}
|
|
|
|
return nil
|
|
}
|
|
```
|
|
|
|
### 3. Stale File Detection and Cleanup
|
|
```go
|
|
func (fm *FileManager) cleanupStaleFiles() error {
|
|
files := []string{
|
|
filepath.Join(fm.BasePaths.State, "last_scan.json"),
|
|
filepath.Join(fm.BasePaths.State, "pending_acks.json"),
|
|
filepath.Join(fm.BasePaths.State, "command_history.json"),
|
|
}
|
|
|
|
for _, file := range files {
|
|
if err := fm.ValidateFile(file); err != nil {
|
|
if isStaleFileError(err) {
|
|
// Backup and remove stale file
|
|
if err := fm.backupAndRemove(file); err != nil {
|
|
log.Printf("Warning: Failed to backup stale file %s: %v", file, err)
|
|
} else {
|
|
log.Printf("Cleaned up stale file: %s", file)
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
return nil
|
|
}
|
|
|
|
func (fm *FileManager) backupAndRemove(filePath string) error {
|
|
if !fm.MigrationConfig.BackupEnabled {
|
|
return os.Remove(filePath)
|
|
}
|
|
|
|
// Create backup with timestamp
|
|
timestamp := time.Now().Format("20060102-150405")
|
|
backupPath := filepath.Join(fm.BasePaths.Backup, fmt.Sprintf("%s.%s", filepath.Base(filePath), timestamp))
|
|
|
|
// Ensure backup directory exists
|
|
if err := os.MkdirAll(fm.BasePaths.Backup, 0755); err != nil {
|
|
return err
|
|
}
|
|
|
|
// Copy to backup
|
|
if err := copyFile(filePath, backupPath); err != nil {
|
|
return err
|
|
}
|
|
|
|
// Remove original
|
|
return os.Remove(filePath)
|
|
}
|
|
```
|
|
|
|
### 4. Path Standardization
|
|
```go
|
|
// Standardized paths for consistency
|
|
const (
|
|
DefaultConfigPath = "/etc/redflag/config.json"
|
|
DefaultStatePath = "/var/lib/redflag/"
|
|
DefaultBackupPath = "/var/lib/redflag/backups/"
|
|
DefaultLogPath = "/var/log/redflag/"
|
|
)
|
|
|
|
func GetStandardPaths() PathConfig {
|
|
return PathConfig{
|
|
Config: DefaultConfigPath,
|
|
State: DefaultStatePath,
|
|
Backup: DefaultBackupPath,
|
|
Logs: DefaultLogPath,
|
|
}
|
|
}
|
|
|
|
func (fm *FileManager) migrateLegacyPaths() error {
|
|
legacyPaths := []string{
|
|
"/etc/aggregator",
|
|
"/var/lib/aggregator",
|
|
}
|
|
|
|
for _, legacyPath := range legacyPaths {
|
|
if _, err := os.Stat(legacyPath); err == nil {
|
|
if err := fm.migrateFromPath(legacyPath); err != nil {
|
|
return fmt.Errorf("failed to migrate from %s: %w", legacyPath, err)
|
|
}
|
|
}
|
|
}
|
|
|
|
return nil
|
|
}
|
|
```
|
|
|
|
### 5. Binary Signature Validation
|
|
```go
|
|
func (fm *FileManager) validateBinarySignature(filePath string) error {
|
|
// Get current binary signature
|
|
currentBinary, err := os.Executable()
|
|
if err != nil {
|
|
return err
|
|
}
|
|
|
|
currentSignature, err := fm.calculateFileSignature(currentBinary)
|
|
if err != nil {
|
|
return err
|
|
}
|
|
|
|
// Read file's expected binary signature
|
|
metadata, err := fm.readFileMetadata(filePath)
|
|
if err != nil {
|
|
return err
|
|
}
|
|
|
|
if metadata.BinarySignature != "" && metadata.BinarySignature != currentSignature {
|
|
return fmt.Errorf("file was created by different binary version")
|
|
}
|
|
|
|
return nil
|
|
}
|
|
```
|
|
|
|
## Definition of Done
|
|
|
|
- [ ] File validation system checks agent ID and version compatibility
|
|
- [ ] Automatic cleanup of stale files from previous installations
|
|
- [ ] Path standardization implemented across codebase
|
|
- [ ] Migration system handles legacy path transitions
|
|
- [ ] Backup system preserves important files during cleanup
|
|
- [ ] Binary signature validation prevents file poisoning
|
|
- [ ] Configuration options for migration behavior
|
|
- [ ] Comprehensive logging for debugging file issues
|
|
|
|
## Implementation Details
|
|
|
|
### File Locations
|
|
- **Primary:** `aggregator-agent/internal/filesystem/` (new package)
|
|
- **Integration:** `aggregator-agent/cmd/agent/main.go` (initialization)
|
|
- **Config:** `aggregator-agent/internal/config/config.go`
|
|
|
|
### Configuration Options
|
|
```json
|
|
{
|
|
"file_management": {
|
|
"paths": {
|
|
"config": "/etc/redflag/config.json",
|
|
"state": "/var/lib/redflag/",
|
|
"backup": "/var/lib/redflag/backups/",
|
|
"logs": "/var/log/redflag/"
|
|
},
|
|
"migration": {
|
|
"cleanup_stale_files": true,
|
|
"backup_on_cleanup": true,
|
|
"max_backups": 10,
|
|
"migrate_legacy_paths": true
|
|
},
|
|
"validation": {
|
|
"validate_agent_id": true,
|
|
"validate_version": true,
|
|
"validate_binary_signature": false
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Integration Points
|
|
```go
|
|
// Agent initialization
|
|
func (a *Agent) initialize() error {
|
|
// Existing initialization...
|
|
|
|
// File management setup
|
|
fileManager := filesystem.NewFileManager(a.config, a.agentID, AgentVersion)
|
|
if err := fileManager.ValidateAndMigrate(); err != nil {
|
|
return fmt.Errorf("file management initialization failed: %w", err)
|
|
}
|
|
|
|
a.fileManager = fileManager
|
|
return nil
|
|
}
|
|
|
|
// Before scan operations
|
|
func (a *Agent) scanForUpdates() error {
|
|
// Validate files before operation
|
|
if err := a.fileManager.ValidateAndMigrate(); err != nil {
|
|
log.Printf("Warning: File validation failed, proceeding anyway: %v", err)
|
|
}
|
|
|
|
// Continue with scan...
|
|
}
|
|
```
|
|
|
|
## Testing Strategy
|
|
|
|
### Unit Tests
|
|
- File validation logic
|
|
- Migration path handling
|
|
- Backup and cleanup operations
|
|
- Signature validation
|
|
|
|
### Integration Tests
|
|
- Full migration scenarios
|
|
- Stale file detection
|
|
- Path transition testing
|
|
- Configuration validation
|
|
|
|
### Manual Test Scenarios
|
|
1. **Stale File Cleanup:**
|
|
- Install agent v1, create state files
|
|
- Install agent v2 with different agent ID
|
|
- Verify stale files are backed up and cleaned
|
|
|
|
2. **Path Migration:**
|
|
- Install agent with old paths
|
|
- Upgrade to new version
|
|
- Verify files are moved to new locations
|
|
|
|
3. **File Corruption Recovery:**
|
|
- Corrupt state files manually
|
|
- Restart agent
|
|
- Verify recovery or graceful degradation
|
|
|
|
## Prerequisites
|
|
|
|
- Configuration system supports nested structures
|
|
- Logging infrastructure supports structured output
|
|
- Agent has unique ID and version information
|
|
- File system permissions allow access to required paths
|
|
|
|
## Effort Estimate
|
|
|
|
**Complexity:** Medium-High
|
|
**Effort:** 3-4 days
|
|
- Day 1: File validation and cleanup system
|
|
- Day 2: Path migration and standardization
|
|
- Day 3: Binary signature validation
|
|
- Day 4: Integration testing and configuration
|
|
|
|
## Success Metrics
|
|
|
|
- Elimination of timeout issues from stale files
|
|
- Zero manual intervention required for upgrades
|
|
- Consistent path usage across codebase
|
|
- No data loss during migration operations
|
|
- Improved system startup reliability
|
|
- Enhanced security through file validation
|
|
|
|
## Monitoring
|
|
|
|
Track these metrics after implementation:
|
|
- File validation error rate
|
|
- Migration success rate
|
|
- Stale file cleanup frequency
|
|
- Path standardization compliance
|
|
- Agent startup time improvement
|
|
- User-reported file issues reduction |