15 KiB
Migration 024 Fix - Implementation Plan
Executive Summary
Migration 024 (024_disable_updates_subsystem.up.sql) is broken because it references a deprecated column that doesn't exist in the agent_subsystems table. This prevents migration 025 from running.
Root Cause: The migration adds ALTER TABLE ADD COLUMN deprecated but the column doesn't exist in the schema when migration 015 created the table.
Current State Analysis
What's Actually Broken
- Migration 024 adds the
deprecatedcolumn viaALTER TABLE ADD COLUMN IF NOT EXISTS - Then tries to
UPDATE ... SET deprecated = true - BUT: If this is a fresh database, migration 015 already ran before 024, and the column doesn't exist when migration 024's UPDATE runs
- Error:
pq: column "deprecated" does not exist - Result: Migration fails, migration 025 never runs
What's Working
- Migration 024 DOES have the correct
ALTER TABLEstatement to add the column - The column DOES work if you run migration 024 standalone (like with
docker exec) - The problem is sequential execution in a transaction
Confusion Point: Why Does The Column Exist?
The deprecated column DOES exist in some databases because migrations are run in a specific order:
- On existing databases: migrations 001-023 already applied, then 024 tries to run
- On fresh databases: migration 015 creates table, then 024 adds column, but there's a transaction boundary issue
Option A: Minimal Fix (2 files, ~3 lines)
Goal: Just make migration 024 not error, accept that updates subsystem may still exist
Changes Needed
File 1: 024_disable_updates_subsystem.up.sql
-- Fix: Remove the UPDATE that references deprecated column
-- Keep only: ALTER TABLE ADD COLUMN (if we want it for historical tracking)
ALTER TABLE agent_subsystems ADD COLUMN IF NOT EXISTS deprecated BOOLEAN DEFAULT false;
-- Remove the UPDATE entirely - it serves no functional purpose
-- No one ever reads the deprecated column anywhere
-- Log migration completion
INSERT INTO schema_migrations (version) VALUES ('024_disable_updates_subsystem.up.sql');
File 2: internal/command/validator.go (line 74)
// Remove "updates" from the validActions slice if it's still there
var validActions = []string{
"storage",
"system",
"docker",
"apt",
"dnf",
"windows",
"winget",
// "updates", // REMOVE THIS
}
Pros of Option A
- ✅ Minimal changes (2 files)
- ✅ Fixes the immediate blocker (migration 024 runs, migration 025 runs)
- ✅ Quick to implement
- ✅ Low risk of breaking something else
Cons of Option A
- ❌ Leaves 12+ files with outdated
updatesreferences - ❌ Ambiguous state:
updatesrows may exist in some databases - ❌ Technical debt: Future developers confused about
updatespurpose - ❌ Not idempotent: Behavior varies based on database state
- ❌ Violates ETHOS "No Marketing Fluff" - we have dead code
Option B: Complete Removal (14 files, ~50 lines)
Goal: Remove ALL references to updates subsystem from codebase
Rationale
From ETHOS.md:
- "Assume Failure; Build for Resilience" - Clean deletion preferred over complexity
- "No Marketing Fluff" - No code that serves no purpose
- "Idempotency is a Requirement" - System behavior should be consistent
From README.md:
- "Breaking changes may happen between versions" - Alpha software
- "Full Reinstall (Nuclear Option)" - Users can reinstall if needed
The project has first-iteration alpha users who can reinstall or manually migrate. The codebase should be clean.
Changes Needed (Complete List)
Database Migrations (6 files)
File 1: 015_agent_subsystems.up.sql
-- Remove INSERT for 'updates' subsystem (lines 33-36)
-- FROM:
INSERT INTO agent_subsystems (agent_id, subsystem, enabled, interval_minutes, auto_run)
SELECT id, 'updates', true, 15, false FROM agents
WHERE NOT EXISTS (
SELECT 1 FROM agent_subsystems WHERE agent_subsystems.agent_id = agents.id AND subsystem = 'updates'
)
UNION ALL
-- TO: DELETE those lines (only keep storage, system, docker INSERTS)
File 2: 015_agent_subsystems.up.sql - Trigger
-- Remove from trigger (line 60)
-- FROM:
INSERT INTO agent_subsystems (agent_id, subsystem, enabled, interval_minutes, auto_run)
VALUES
(NEW.id, 'storage', true, 5, true),
(NEW.id, 'system', true, 5, true),
(NEW.id, 'docker', false, 15, false),
(NEW.id, 'updates', true, 15, false); -- REMOVE THIS LINE
-- TO: Only storage, system, docker
File 3: 022_add_subsystem_to_logs.up.sql - Constraint
-- Update CHECK constraint (around line 30)
-- FROM:
CHECK (subsystem IN ('docker', 'storage', 'system', 'apt', 'dnf', 'winget', 'updates', ...))
-- TO: Remove 'updates' from constraint
CHECK (subsystem IN ('docker', 'storage', 'system', 'apt', 'dnf', 'winget', ...))
File 4: 024_disable_updates_subsystem.up.sql
-- Complete rewrite
-- FROM: Disable with deprecated flag
-- TO: Delete entirely
-- Migration: Remove legacy updates subsystem
-- Purpose: Delete monolithic updates subsystem (replaced by apt/dnf/winget/windows)
-- Version: 0.1.29
-- Date: 2025-12-23
-- Remove all 'updates' subsystems
DELETE FROM agent_subsystems WHERE subsystem = 'updates';
-- Log migration completion
INSERT INTO schema_migrations (version) VALUES ('024_remove_updates_subsystem.up.sql');
File 5: 024_disable_updates_subsystem.down.sql
-- Complete rewrite
-- FROM: Re-enable and drop deprecated column
-- TO: Re-insert updates (for rollback only)
-- Rollback: Re-add legacy updates subsystem
INSERT INTO agent_subsystems (agent_id, subsystem, enabled, interval_minutes, auto_run, created_at, updated_at)
SELECT id, 'updates', false, 15, false, NOW(), NOW()
FROM agents
WHERE NOT EXISTS (
SELECT 1 FROM agent_subsystems WHERE agent_id = agents.id AND subsystem = 'updates'
);
RAISE WARNING 'Re-added legacy updates subsystem - may conflict with platform-specific scanners';
File 6: 025_platform_scanner_subsystems.down.sql - Trigger
-- Remove 'updates' from rollback trigger (line 19)
-- FROM:
INSERT INTO agent_subsystems (agent_id, subsystem, enabled, interval_minutes, auto_run)
VALUES
(NEW.id, 'storage', true, 5, true),
(NEW.id, 'system', true, 5, true),
(NEW.id, 'updates', true, 15, false); -- REMOVE THIS LINE
-- TO: Don't add 'updates' in rollback
Agent Config Files (4 files)
File 7: aggregator-agent/internal/config/subsystems.go
// Remove Updates field from SubsystemsConfig struct
// Line ~40:
type SubsystemsConfig struct {
System SubsystemConfig `json:"system"`
// Updates SubsystemConfig `json:"updates"` // REMOVE THIS LINE
Docker SubsystemConfig `json:"docker"`
Storage SubsystemConfig `json:"storage"`
APT SubsystemConfig `json:"apt"`
DNF SubsystemConfig `json:"dnf"`
}
// Remove from GetDefaultSubsystemsConfig()
// Lines ~76-81:
func GetDefaultSubsystemsConfig() SubsystemsConfig {
return SubsystemsConfig{
System: GetDefaultSystemConfig(),
// REMOVED: Updates: SubsystemConfig{...}
Docker: SubsystemConfig{
Enabled: false,
Timeout: 0,
IntervalMinutes: 120,
CircuitBreaker: CircuitBreakerConfig{Enabled: false},
},
...
}
}
File 8: aggregator-agent/internal/config/config.go
// Remove Updates migration from migrateConfig()
// Lines ~338-341:
func migrateConfig(cfg *Config) {
if cfg.Subsystems.Updates == (SubsystemConfig{}) {
fmt.Printf("[CONFIG] Adding missing 'updates' subsystem configuration\n")
cfg.Subsystems.Updates = GetDefaultSubsystemsConfig().Updates
}
// Remove the above block entirely
}
File 9: aggregator-agent/internal/migration/detection.go
// Check if version detection includes updates
// Remove or update version detection that checks for updates_subsystem
// Around line 50-70
File 10: aggregator-agent/internal/migration/executor.go
// Check for any migration logic that adds updates
// Remove or update accordingly
Server Code (2 files)
File 11: aggregator-server/internal/command/validator.go
// Line ~74: Remove "updates" from validActions
var validActions = []string{
"storage",
"system",
"docker",
"apt",
"dnf",
"windows",
"winget",
// "updates", // REMOVE THIS
}
File 12: aggregator-server/internal/api/handlers/subsystems.go
// Add validation to TriggerSubsystem
func (h *SubsystemHandler) TriggerSubsystem(c *gin.Context) {
subsystem := c.Param("subsystem")
// ADD VALIDATION
validSubsystems := []string{"storage", "system", "docker", "apt", "dnf", "windows", "winget"}
subsystemValid := false
for _, valid := range validSubsystems {
if subsystem == valid {
subsystemValid = true
break
}
}
if !subsystemValid {
c.JSON(http.StatusBadRequest, gin.H{"error": fmt.Sprintf("Invalid subsystem: %s", subsystem)})
return
}
// ... rest of function
}
Agent Main (1 file)
File 13: aggregator-agent/cmd/agent/main.go
// Remove case for "updates" in getCurrentSubsystemEnabled
// Lines ~518-520:
func getCurrentSubsystemEnabled(cfg *config.Config, subsystemName string) bool {
switch subsystemName {
case "system":
return cfg.Subsystems.System.Enabled
// case "updates": // REMOVE
// return cfg.Subsystems.Updates.Enabled // REMOVE
case "docker":
return cfg.Subsystems.Docker.Enabled
...
}
}
Frontend (2 files)
File 14: aggregator-web/src/types/index.ts
// Remove 'updates' from Subsystem type if it exists
export type Subsystem = 'storage' | 'system' | 'docker' | 'apt' | 'dnf' | 'windows' | 'winget'
// REMOVED: | 'updates'
File 15: aggregator-web/src/lib/command-naming.ts
// Clean up old comments referencing scan_updates (cosmetic)
// Lines with comments mentioning scan_updates
Pros of Option B
- ✅ Clean codebase - No dead code or ambiguous state
- ✅ Idempotent - Behavior consistent across all database states
- ✅ ETHOS-aligned - "No Marketing Fluff", "Simplicity"
- ✅ Self-documenting - Code clearly shows platform-specific architecture
- ✅ Future-proof - No technical debt for next developer
Cons of Option B
- ❌ More work - 14 files vs 2 files
- ❌ Higher risk - More touch points, more chance of breaking something
- ❌ Breaking change - Old configs with
updatesmay fail - ❌ Alpha users impacted - May require agent re-registration
Design Intent Research
What Commit 9b72662 Actually Did
From git show 9b72662:
- Created migration 024 with
deprecatedcolumn approach - Updated scheduler to remove
updatesfromgetDefaultInterval() - Removed
updatesfromCreateDefaultSubsystems() - DID NOT remove from: migration 015, agent config, agent main
Conclusion: The author did a partial removal, focusing on server-side scheduling only. They likely planned to remove agent-side references later but didn't finish.
Why Partial?
Alpha software timeline:
- Dec 20: Commit
d255f91removes scan_updates references from frontend - Dec 22: Migration 024 created to "disable" updates subsystem
- Dec 23: Agent re-registration issues discovered (our current session)
The pace suggests iterative development - fix the immediate blocker (404 errors), clean up later.
ETHOS & README Alignment
ETHOS Principles (from /home/casey/Projects/RedFlag/docs/1_ETHOS/ETHOS.md):
- "Errors are History, Not /dev/null" - Track what changed, but don't preserve broken code
- "No Marketing Fluff" - Don't add features/code that serve no purpose
- "Assume Failure; Build for Resilience" - Clean state, not ambiguous state
- "Idempotency is a Requirement" - Consistent behavior
README Claims:
- "Breaking changes may happen between versions"
- "Full Reinstall (Nuclear Option)" as cleanup strategy
- Alpha software, users can handle breaking changes
Analysis:
- Option B (complete removal) better aligns with ETHOS
- Option A (partial fix) creates technical debt that violates "No Fluff"
- README accepts breaking changes, so Option B is acceptable
Recommendation
Recommended: Option B (Complete Removal)
Rationale:
- ETHOS-aligned: Clean, honest code without ambiguity
- Alpha-appropriate: Users can reinstall
- Self-documenting: Clear that platform-specific is the architecture
- Prevents future bugs: No chance of scheduler creating
scan_updatescommands - Original intent: Author appears to have planned this but didn't finish
Arguments Against Option A:
- Technical debt: 12 files with dead code
- Inconsistent state: Some places disable, some reference
- Future confusion: Developer sees
Updatesfield, thinks it's used - Potential bugs: Could add
updatesvalidation somewhere and break - Not future-proof: Eventually someone has to clean this up
Risk Mitigation for Option B:
- Test migration on backup first
- Document in release notes: "Breaking change - agents may need re-registration"
- Provide manual SQL for users who hit issues: "Run: DELETE FROM agent_subsystems WHERE subsystem='updates'"
- Add validation with clear error messages
Implementation Checklist
Phase 1: Database Migrations
- Edit
015_agent_subsystems.up.sql- Remove INSERT for 'updates' - Edit
015_agent_subsystems.up.sql- Remove 'updates' from trigger - Edit
022_add_subsystem_to_logs.up.sql- Remove 'updates' from constraint - Rewrite
024_disable_updates_subsystem.up.sql- DELETE instead of UPDATE - Rewrite
024_disable_updates_subsystem.down.sql- INSERT for rollback - Edit
025_platform_scanner_subsystems.down.sql- Remove 'updates' from trigger
Phase 2: Agent Config
- Edit
aggregator-agent/internal/config/subsystems.go- Remove Updates field - Edit
aggregator-agent/internal/config/subsystems.go- Remove from defaults - Edit
aggregator-agent/internal/config/config.go- Remove Updates migration
Phase 3: Agent Code
- Edit
aggregator-agent/cmd/agent/main.go- Remove updates case
Phase 4: Server Code
- Edit
aggregator-server/internal/command/validator.go- Remove from validActions - Edit
aggregator-server/internal/api/handlers/subsystems.go- Add validation
Phase 5: Frontend (Optional)
- Edit
aggregator-web/src/types/index.ts- Remove from types if needed - Remove references in command-naming comments
Phase 6: Testing
- Test fresh install: Empty DB, run migrations, verify no updates rows
- Test existing DB: With updates rows, run migrations, verify deleted
- Test agent re-registration
- Test API validation (try to trigger updates, should fail)
Phase 7: Documentation
- Update ChristmasTodos.md
- Update README.md if it mentions updates subsystem
- Create migration notes for alpha users
Conclusion
The migration 024 fix requires a decision between:
- Quick fix (Option A) - 2 files, removes blocker, leaves technical debt
- Proper fix (Option B) - 14 files, complete removal, ETHOS-aligned
The agent recommends Option B despite the larger scope because:
- It aligns with ETHOS principles
- Alpha software can handle breaking changes
- Prevents future confusion and bugs
- The original author appears to have intended this but didn't finish
Next Step: Proceed with implementation or discuss scope reduction.