Files
Redflag/migration-024-fix-plan.md

452 lines
15 KiB
Markdown

# Migration 024 Fix - Implementation Plan
## Executive Summary
Migration 024 (`024_disable_updates_subsystem.up.sql`) is broken because it references a `deprecated` column that doesn't exist in the `agent_subsystems` table. This prevents migration 025 from running.
**Root Cause**: The migration adds `ALTER TABLE ADD COLUMN deprecated` but the column doesn't exist in the schema when migration 015 created the table.
## Current State Analysis
### What's Actually Broken
1. Migration 024 adds the `deprecated` column via `ALTER TABLE ADD COLUMN IF NOT EXISTS`
2. Then tries to `UPDATE ... SET deprecated = true`
3. **BUT**: If this is a fresh database, migration 015 already ran before 024, and the column doesn't exist when migration 024's UPDATE runs
4. **Error**: `pq: column "deprecated" does not exist`
5. **Result**: Migration fails, migration 025 never runs
### What's Working
- Migration 024 DOES have the correct `ALTER TABLE` statement to add the column
- The column DOES work if you run migration 024 standalone (like with `docker exec`)
- The problem is sequential execution in a transaction
### Confusion Point: Why Does The Column Exist?
The `deprecated` column DOES exist in some databases because migrations are run in a specific order:
- On existing databases: migrations 001-023 already applied, then 024 tries to run
- On fresh databases: migration 015 creates table, then 024 adds column, but there's a transaction boundary issue
## Option A: Minimal Fix (2 files, ~3 lines)
**Goal**: Just make migration 024 not error, accept that `updates` subsystem may still exist
### Changes Needed
#### File 1: `024_disable_updates_subsystem.up.sql`
```sql
-- Fix: Remove the UPDATE that references deprecated column
-- Keep only: ALTER TABLE ADD COLUMN (if we want it for historical tracking)
ALTER TABLE agent_subsystems ADD COLUMN IF NOT EXISTS deprecated BOOLEAN DEFAULT false;
-- Remove the UPDATE entirely - it serves no functional purpose
-- No one ever reads the deprecated column anywhere
-- Log migration completion
INSERT INTO schema_migrations (version) VALUES ('024_disable_updates_subsystem.up.sql');
```
#### File 2: `internal/command/validator.go` (line 74)
```go
// Remove "updates" from the validActions slice if it's still there
var validActions = []string{
"storage",
"system",
"docker",
"apt",
"dnf",
"windows",
"winget",
// "updates", // REMOVE THIS
}
```
### Pros of Option A
- ✅ Minimal changes (2 files)
- ✅ Fixes the immediate blocker (migration 024 runs, migration 025 runs)
- ✅ Quick to implement
- ✅ Low risk of breaking something else
### Cons of Option A
- ❌ Leaves 12+ files with outdated `updates` references
- ❌ Ambiguous state: `updates` rows may exist in some databases
- ❌ Technical debt: Future developers confused about `updates` purpose
- ❌ Not idempotent: Behavior varies based on database state
- ❌ Violates ETHOS "No Marketing Fluff" - we have dead code
## Option B: Complete Removal (14 files, ~50 lines)
**Goal**: Remove ALL references to `updates` subsystem from codebase
### Rationale
From ETHOS.md:
- **"Assume Failure; Build for Resilience"** - Clean deletion preferred over complexity
- **"No Marketing Fluff"** - No code that serves no purpose
- **"Idempotency is a Requirement"** - System behavior should be consistent
From README.md:
- **"Breaking changes may happen between versions"** - Alpha software
- **"Full Reinstall (Nuclear Option)"** - Users can reinstall if needed
The project has first-iteration alpha users who can reinstall or manually migrate. The codebase should be clean.
### Changes Needed (Complete List)
#### Database Migrations (6 files)
**File 1: `015_agent_subsystems.up.sql`**
```sql
-- Remove INSERT for 'updates' subsystem (lines 33-36)
-- FROM:
INSERT INTO agent_subsystems (agent_id, subsystem, enabled, interval_minutes, auto_run)
SELECT id, 'updates', true, 15, false FROM agents
WHERE NOT EXISTS (
SELECT 1 FROM agent_subsystems WHERE agent_subsystems.agent_id = agents.id AND subsystem = 'updates'
)
UNION ALL
-- TO: DELETE those lines (only keep storage, system, docker INSERTS)
```
**File 2: `015_agent_subsystems.up.sql` - Trigger**
```sql
-- Remove from trigger (line 60)
-- FROM:
INSERT INTO agent_subsystems (agent_id, subsystem, enabled, interval_minutes, auto_run)
VALUES
(NEW.id, 'storage', true, 5, true),
(NEW.id, 'system', true, 5, true),
(NEW.id, 'docker', false, 15, false),
(NEW.id, 'updates', true, 15, false); -- REMOVE THIS LINE
-- TO: Only storage, system, docker
```
**File 3: `022_add_subsystem_to_logs.up.sql` - Constraint**
```sql
-- Update CHECK constraint (around line 30)
-- FROM:
CHECK (subsystem IN ('docker', 'storage', 'system', 'apt', 'dnf', 'winget', 'updates', ...))
-- TO: Remove 'updates' from constraint
CHECK (subsystem IN ('docker', 'storage', 'system', 'apt', 'dnf', 'winget', ...))
```
**File 4: `024_disable_updates_subsystem.up.sql`**
```sql
-- Complete rewrite
-- FROM: Disable with deprecated flag
-- TO: Delete entirely
-- Migration: Remove legacy updates subsystem
-- Purpose: Delete monolithic updates subsystem (replaced by apt/dnf/winget/windows)
-- Version: 0.1.29
-- Date: 2025-12-23
-- Remove all 'updates' subsystems
DELETE FROM agent_subsystems WHERE subsystem = 'updates';
-- Log migration completion
INSERT INTO schema_migrations (version) VALUES ('024_remove_updates_subsystem.up.sql');
```
**File 5: `024_disable_updates_subsystem.down.sql`**
```sql
-- Complete rewrite
-- FROM: Re-enable and drop deprecated column
-- TO: Re-insert updates (for rollback only)
-- Rollback: Re-add legacy updates subsystem
INSERT INTO agent_subsystems (agent_id, subsystem, enabled, interval_minutes, auto_run, created_at, updated_at)
SELECT id, 'updates', false, 15, false, NOW(), NOW()
FROM agents
WHERE NOT EXISTS (
SELECT 1 FROM agent_subsystems WHERE agent_id = agents.id AND subsystem = 'updates'
);
RAISE WARNING 'Re-added legacy updates subsystem - may conflict with platform-specific scanners';
```
**File 6: `025_platform_scanner_subsystems.down.sql` - Trigger**
```sql
-- Remove 'updates' from rollback trigger (line 19)
-- FROM:
INSERT INTO agent_subsystems (agent_id, subsystem, enabled, interval_minutes, auto_run)
VALUES
(NEW.id, 'storage', true, 5, true),
(NEW.id, 'system', true, 5, true),
(NEW.id, 'updates', true, 15, false); -- REMOVE THIS LINE
-- TO: Don't add 'updates' in rollback
```
#### Agent Config Files (4 files)
**File 7: `aggregator-agent/internal/config/subsystems.go`**
```go
// Remove Updates field from SubsystemsConfig struct
// Line ~40:
type SubsystemsConfig struct {
System SubsystemConfig `json:"system"`
// Updates SubsystemConfig `json:"updates"` // REMOVE THIS LINE
Docker SubsystemConfig `json:"docker"`
Storage SubsystemConfig `json:"storage"`
APT SubsystemConfig `json:"apt"`
DNF SubsystemConfig `json:"dnf"`
}
```
```go
// Remove from GetDefaultSubsystemsConfig()
// Lines ~76-81:
func GetDefaultSubsystemsConfig() SubsystemsConfig {
return SubsystemsConfig{
System: GetDefaultSystemConfig(),
// REMOVED: Updates: SubsystemConfig{...}
Docker: SubsystemConfig{
Enabled: false,
Timeout: 0,
IntervalMinutes: 120,
CircuitBreaker: CircuitBreakerConfig{Enabled: false},
},
...
}
}
```
**File 8: `aggregator-agent/internal/config/config.go`**
```go
// Remove Updates migration from migrateConfig()
// Lines ~338-341:
func migrateConfig(cfg *Config) {
if cfg.Subsystems.Updates == (SubsystemConfig{}) {
fmt.Printf("[CONFIG] Adding missing 'updates' subsystem configuration\n")
cfg.Subsystems.Updates = GetDefaultSubsystemsConfig().Updates
}
// Remove the above block entirely
}
```
**File 9: `aggregator-agent/internal/migration/detection.go`**
```go
// Check if version detection includes updates
// Remove or update version detection that checks for updates_subsystem
// Around line 50-70
```
**File 10: `aggregator-agent/internal/migration/executor.go`**
```go
// Check for any migration logic that adds updates
// Remove or update accordingly
```
#### Server Code (2 files)
**File 11: `aggregator-server/internal/command/validator.go`**
```go
// Line ~74: Remove "updates" from validActions
var validActions = []string{
"storage",
"system",
"docker",
"apt",
"dnf",
"windows",
"winget",
// "updates", // REMOVE THIS
}
```
**File 12: `aggregator-server/internal/api/handlers/subsystems.go`**
```go
// Add validation to TriggerSubsystem
func (h *SubsystemHandler) TriggerSubsystem(c *gin.Context) {
subsystem := c.Param("subsystem")
// ADD VALIDATION
validSubsystems := []string{"storage", "system", "docker", "apt", "dnf", "windows", "winget"}
subsystemValid := false
for _, valid := range validSubsystems {
if subsystem == valid {
subsystemValid = true
break
}
}
if !subsystemValid {
c.JSON(http.StatusBadRequest, gin.H{"error": fmt.Sprintf("Invalid subsystem: %s", subsystem)})
return
}
// ... rest of function
}
```
#### Agent Main (1 file)
**File 13: `aggregator-agent/cmd/agent/main.go`**
```go
// Remove case for "updates" in getCurrentSubsystemEnabled
// Lines ~518-520:
func getCurrentSubsystemEnabled(cfg *config.Config, subsystemName string) bool {
switch subsystemName {
case "system":
return cfg.Subsystems.System.Enabled
// case "updates": // REMOVE
// return cfg.Subsystems.Updates.Enabled // REMOVE
case "docker":
return cfg.Subsystems.Docker.Enabled
...
}
}
```
#### Frontend (2 files)
**File 14: `aggregator-web/src/types/index.ts`**
```typescript
// Remove 'updates' from Subsystem type if it exists
export type Subsystem = 'storage' | 'system' | 'docker' | 'apt' | 'dnf' | 'windows' | 'winget'
// REMOVED: | 'updates'
```
**File 15: `aggregator-web/src/lib/command-naming.ts`**
```typescript
// Clean up old comments referencing scan_updates (cosmetic)
// Lines with comments mentioning scan_updates
```
### Pros of Option B
-**Clean codebase** - No dead code or ambiguous state
-**Idempotent** - Behavior consistent across all database states
-**ETHOS-aligned** - "No Marketing Fluff", "Simplicity"
-**Self-documenting** - Code clearly shows platform-specific architecture
-**Future-proof** - No technical debt for next developer
### Cons of Option B
-**More work** - 14 files vs 2 files
-**Higher risk** - More touch points, more chance of breaking something
-**Breaking change** - Old configs with `updates` may fail
-**Alpha users impacted** - May require agent re-registration
## Design Intent Research
### What Commit 9b72662 Actually Did
From git show 9b72662:
- Created migration 024 with `deprecated` column approach
- Updated scheduler to remove `updates` from `getDefaultInterval()`
- Removed `updates` from `CreateDefaultSubsystems()`
- DID NOT remove from: migration 015, agent config, agent main
**Conclusion**: The author did a **partial removal**, focusing on server-side scheduling only. They likely planned to remove agent-side references later but didn't finish.
### Why Partial?
Alpha software timeline:
- Dec 20: Commit d255f91 removes scan_updates references from frontend
- Dec 22: Migration 024 created to "disable" updates subsystem
- Dec 23: Agent re-registration issues discovered (our current session)
The pace suggests **iterative development** - fix the immediate blocker (404 errors), clean up later.
### ETHOS & README Alignment
**ETHOS Principles** (from `/home/casey/Projects/RedFlag/docs/1_ETHOS/ETHOS.md`):
1. **"Errors are History, Not /dev/null"** - Track what changed, but don't preserve broken code
2. **"No Marketing Fluff"** - Don't add features/code that serve no purpose
3. **"Assume Failure; Build for Resilience"** - Clean state, not ambiguous state
4. **"Idempotency is a Requirement"** - Consistent behavior
**README Claims**:
- "Breaking changes may happen between versions"
- "Full Reinstall (Nuclear Option)" as cleanup strategy
- Alpha software, users can handle breaking changes
**Analysis**:
- Option B (complete removal) better aligns with ETHOS
- Option A (partial fix) creates technical debt that violates "No Fluff"
- README accepts breaking changes, so Option B is acceptable
## Recommendation
### Recommended: Option B (Complete Removal)
**Rationale**:
1. **ETHOS-aligned**: Clean, honest code without ambiguity
2. **Alpha-appropriate**: Users can reinstall
3. **Self-documenting**: Clear that platform-specific is the architecture
4. **Prevents future bugs**: No chance of scheduler creating `scan_updates` commands
5. **Original intent**: Author appears to have planned this but didn't finish
**Arguments Against Option A**:
1. **Technical debt**: 12 files with dead code
2. **Inconsistent state**: Some places disable, some reference
3. **Future confusion**: Developer sees `Updates` field, thinks it's used
4. **Potential bugs**: Could add `updates` validation somewhere and break
5. **Not future-proof**: Eventually someone has to clean this up
**Risk Mitigation for Option B**:
1. Test migration on backup first
2. Document in release notes: "Breaking change - agents may need re-registration"
3. Provide manual SQL for users who hit issues: "Run: DELETE FROM agent_subsystems WHERE subsystem='updates'"
4. Add validation with clear error messages
## Implementation Checklist
### Phase 1: Database Migrations
- [ ] Edit `015_agent_subsystems.up.sql` - Remove INSERT for 'updates'
- [ ] Edit `015_agent_subsystems.up.sql` - Remove 'updates' from trigger
- [ ] Edit `022_add_subsystem_to_logs.up.sql` - Remove 'updates' from constraint
- [ ] Rewrite `024_disable_updates_subsystem.up.sql` - DELETE instead of UPDATE
- [ ] Rewrite `024_disable_updates_subsystem.down.sql` - INSERT for rollback
- [ ] Edit `025_platform_scanner_subsystems.down.sql` - Remove 'updates' from trigger
### Phase 2: Agent Config
- [ ] Edit `aggregator-agent/internal/config/subsystems.go` - Remove Updates field
- [ ] Edit `aggregator-agent/internal/config/subsystems.go` - Remove from defaults
- [ ] Edit `aggregator-agent/internal/config/config.go` - Remove Updates migration
### Phase 3: Agent Code
- [ ] Edit `aggregator-agent/cmd/agent/main.go` - Remove updates case
### Phase 4: Server Code
- [ ] Edit `aggregator-server/internal/command/validator.go` - Remove from validActions
- [ ] Edit `aggregator-server/internal/api/handlers/subsystems.go` - Add validation
### Phase 5: Frontend (Optional)
- [ ] Edit `aggregator-web/src/types/index.ts` - Remove from types if needed
- [ ] Remove references in command-naming comments
### Phase 6: Testing
- [ ] Test fresh install: Empty DB, run migrations, verify no updates rows
- [ ] Test existing DB: With updates rows, run migrations, verify deleted
- [ ] Test agent re-registration
- [ ] Test API validation (try to trigger updates, should fail)
### Phase 7: Documentation
- [ ] Update ChristmasTodos.md
- [ ] Update README.md if it mentions updates subsystem
- [ ] Create migration notes for alpha users
## Conclusion
The migration 024 fix requires a decision between:
1. **Quick fix** (Option A) - 2 files, removes blocker, leaves technical debt
2. **Proper fix** (Option B) - 14 files, complete removal, ETHOS-aligned
The agent recommends **Option B** despite the larger scope because:
- It aligns with ETHOS principles
- Alpha software can handle breaking changes
- Prevents future confusion and bugs
- The original author appears to have intended this but didn't finish
**Next Step**: Proceed with implementation or discuss scope reduction.