Files
Redflag/migration-024-fix-plan.md

15 KiB

Migration 024 Fix - Implementation Plan

Executive Summary

Migration 024 (024_disable_updates_subsystem.up.sql) is broken because it references a deprecated column that doesn't exist in the agent_subsystems table. This prevents migration 025 from running.

Root Cause: The migration adds ALTER TABLE ADD COLUMN deprecated but the column doesn't exist in the schema when migration 015 created the table.

Current State Analysis

What's Actually Broken

  1. Migration 024 adds the deprecated column via ALTER TABLE ADD COLUMN IF NOT EXISTS
  2. Then tries to UPDATE ... SET deprecated = true
  3. BUT: If this is a fresh database, migration 015 already ran before 024, and the column doesn't exist when migration 024's UPDATE runs
  4. Error: pq: column "deprecated" does not exist
  5. Result: Migration fails, migration 025 never runs

What's Working

  • Migration 024 DOES have the correct ALTER TABLE statement to add the column
  • The column DOES work if you run migration 024 standalone (like with docker exec)
  • The problem is sequential execution in a transaction

Confusion Point: Why Does The Column Exist?

The deprecated column DOES exist in some databases because migrations are run in a specific order:

  • On existing databases: migrations 001-023 already applied, then 024 tries to run
  • On fresh databases: migration 015 creates table, then 024 adds column, but there's a transaction boundary issue

Option A: Minimal Fix (2 files, ~3 lines)

Goal: Just make migration 024 not error, accept that updates subsystem may still exist

Changes Needed

File 1: 024_disable_updates_subsystem.up.sql

-- Fix: Remove the UPDATE that references deprecated column
-- Keep only: ALTER TABLE ADD COLUMN (if we want it for historical tracking)
ALTER TABLE agent_subsystems ADD COLUMN IF NOT EXISTS deprecated BOOLEAN DEFAULT false;

-- Remove the UPDATE entirely - it serves no functional purpose
-- No one ever reads the deprecated column anywhere

-- Log migration completion
INSERT INTO schema_migrations (version) VALUES ('024_disable_updates_subsystem.up.sql');

File 2: internal/command/validator.go (line 74)

// Remove "updates" from the validActions slice if it's still there
var validActions = []string{
    "storage",
    "system",
    "docker",
    "apt",
    "dnf",
    "windows",
    "winget",
    // "updates", // REMOVE THIS
}

Pros of Option A

  • Minimal changes (2 files)
  • Fixes the immediate blocker (migration 024 runs, migration 025 runs)
  • Quick to implement
  • Low risk of breaking something else

Cons of Option A

  • Leaves 12+ files with outdated updates references
  • Ambiguous state: updates rows may exist in some databases
  • Technical debt: Future developers confused about updates purpose
  • Not idempotent: Behavior varies based on database state
  • Violates ETHOS "No Marketing Fluff" - we have dead code

Option B: Complete Removal (14 files, ~50 lines)

Goal: Remove ALL references to updates subsystem from codebase

Rationale

From ETHOS.md:

  • "Assume Failure; Build for Resilience" - Clean deletion preferred over complexity
  • "No Marketing Fluff" - No code that serves no purpose
  • "Idempotency is a Requirement" - System behavior should be consistent

From README.md:

  • "Breaking changes may happen between versions" - Alpha software
  • "Full Reinstall (Nuclear Option)" - Users can reinstall if needed

The project has first-iteration alpha users who can reinstall or manually migrate. The codebase should be clean.

Changes Needed (Complete List)

Database Migrations (6 files)

File 1: 015_agent_subsystems.up.sql

-- Remove INSERT for 'updates' subsystem (lines 33-36)
-- FROM:
INSERT INTO agent_subsystems (agent_id, subsystem, enabled, interval_minutes, auto_run)
SELECT id, 'updates', true, 15, false FROM agents
WHERE NOT EXISTS (
    SELECT 1 FROM agent_subsystems WHERE agent_subsystems.agent_id = agents.id AND subsystem = 'updates'
)
UNION ALL

-- TO: DELETE those lines (only keep storage, system, docker INSERTS)

File 2: 015_agent_subsystems.up.sql - Trigger

-- Remove from trigger (line 60)
-- FROM:
INSERT INTO agent_subsystems (agent_id, subsystem, enabled, interval_minutes, auto_run)
VALUES
    (NEW.id, 'storage', true, 5, true),
    (NEW.id, 'system', true, 5, true),
    (NEW.id, 'docker', false, 15, false),
    (NEW.id, 'updates', true, 15, false);  -- REMOVE THIS LINE

-- TO: Only storage, system, docker

File 3: 022_add_subsystem_to_logs.up.sql - Constraint

-- Update CHECK constraint (around line 30)
-- FROM:
CHECK (subsystem IN ('docker', 'storage', 'system', 'apt', 'dnf', 'winget', 'updates', ...))

-- TO: Remove 'updates' from constraint
CHECK (subsystem IN ('docker', 'storage', 'system', 'apt', 'dnf', 'winget', ...))

File 4: 024_disable_updates_subsystem.up.sql

-- Complete rewrite
-- FROM: Disable with deprecated flag
-- TO: Delete entirely

-- Migration: Remove legacy updates subsystem
-- Purpose: Delete monolithic updates subsystem (replaced by apt/dnf/winget/windows)
-- Version: 0.1.29
-- Date: 2025-12-23

-- Remove all 'updates' subsystems
DELETE FROM agent_subsystems WHERE subsystem = 'updates';

-- Log migration completion
INSERT INTO schema_migrations (version) VALUES ('024_remove_updates_subsystem.up.sql');

File 5: 024_disable_updates_subsystem.down.sql

-- Complete rewrite
-- FROM: Re-enable and drop deprecated column
-- TO: Re-insert updates (for rollback only)

-- Rollback: Re-add legacy updates subsystem
INSERT INTO agent_subsystems (agent_id, subsystem, enabled, interval_minutes, auto_run, created_at, updated_at)
SELECT id, 'updates', false, 15, false, NOW(), NOW()
FROM agents
WHERE NOT EXISTS (
    SELECT 1 FROM agent_subsystems WHERE agent_id = agents.id AND subsystem = 'updates'
);

RAISE WARNING 'Re-added legacy updates subsystem - may conflict with platform-specific scanners';

File 6: 025_platform_scanner_subsystems.down.sql - Trigger

-- Remove 'updates' from rollback trigger (line 19)
-- FROM:
INSERT INTO agent_subsystems (agent_id, subsystem, enabled, interval_minutes, auto_run)
VALUES
    (NEW.id, 'storage', true, 5, true),
    (NEW.id, 'system', true, 5, true),
    (NEW.id, 'updates', true, 15, false);  -- REMOVE THIS LINE

-- TO: Don't add 'updates' in rollback

Agent Config Files (4 files)

File 7: aggregator-agent/internal/config/subsystems.go

// Remove Updates field from SubsystemsConfig struct
// Line ~40:
type SubsystemsConfig struct {
    System  SubsystemConfig `json:"system"`
    // Updates SubsystemConfig `json:"updates"`  // REMOVE THIS LINE
    Docker  SubsystemConfig `json:"docker"`
    Storage SubsystemConfig `json:"storage"`
    APT     SubsystemConfig `json:"apt"`
    DNF     SubsystemConfig `json:"dnf"`
}
// Remove from GetDefaultSubsystemsConfig()
// Lines ~76-81:
func GetDefaultSubsystemsConfig() SubsystemsConfig {
    return SubsystemsConfig{
        System: GetDefaultSystemConfig(),
        // REMOVED: Updates: SubsystemConfig{...}
        Docker: SubsystemConfig{
            Enabled:         false,
            Timeout:         0,
            IntervalMinutes: 120,
            CircuitBreaker:  CircuitBreakerConfig{Enabled: false},
        },
        ...
    }
}

File 8: aggregator-agent/internal/config/config.go

// Remove Updates migration from migrateConfig()
// Lines ~338-341:
func migrateConfig(cfg *Config) {
    if cfg.Subsystems.Updates == (SubsystemConfig{}) {
        fmt.Printf("[CONFIG] Adding missing 'updates' subsystem configuration\n")
        cfg.Subsystems.Updates = GetDefaultSubsystemsConfig().Updates
    }
    // Remove the above block entirely
}

File 9: aggregator-agent/internal/migration/detection.go

// Check if version detection includes updates
// Remove or update version detection that checks for updates_subsystem
// Around line 50-70

File 10: aggregator-agent/internal/migration/executor.go

// Check for any migration logic that adds updates
// Remove or update accordingly

Server Code (2 files)

File 11: aggregator-server/internal/command/validator.go

// Line ~74: Remove "updates" from validActions
var validActions = []string{
    "storage",
    "system",
    "docker",
    "apt",
    "dnf",
    "windows",
    "winget",
    // "updates",  // REMOVE THIS
}

File 12: aggregator-server/internal/api/handlers/subsystems.go

// Add validation to TriggerSubsystem
func (h *SubsystemHandler) TriggerSubsystem(c *gin.Context) {
    subsystem := c.Param("subsystem")

    // ADD VALIDATION
    validSubsystems := []string{"storage", "system", "docker", "apt", "dnf", "windows", "winget"}
    subsystemValid := false
    for _, valid := range validSubsystems {
        if subsystem == valid {
            subsystemValid = true
            break
        }
    }

    if !subsystemValid {
        c.JSON(http.StatusBadRequest, gin.H{"error": fmt.Sprintf("Invalid subsystem: %s", subsystem)})
        return
    }

    // ... rest of function
}

Agent Main (1 file)

File 13: aggregator-agent/cmd/agent/main.go

// Remove case for "updates" in getCurrentSubsystemEnabled
// Lines ~518-520:
func getCurrentSubsystemEnabled(cfg *config.Config, subsystemName string) bool {
    switch subsystemName {
    case "system":
        return cfg.Subsystems.System.Enabled
    // case "updates":  // REMOVE
    //    return cfg.Subsystems.Updates.Enabled  // REMOVE
    case "docker":
        return cfg.Subsystems.Docker.Enabled
    ...
    }
}

Frontend (2 files)

File 14: aggregator-web/src/types/index.ts

// Remove 'updates' from Subsystem type if it exists
export type Subsystem = 'storage' | 'system' | 'docker' | 'apt' | 'dnf' | 'windows' | 'winget'
// REMOVED: | 'updates'

File 15: aggregator-web/src/lib/command-naming.ts

// Clean up old comments referencing scan_updates (cosmetic)
// Lines with comments mentioning scan_updates

Pros of Option B

  • Clean codebase - No dead code or ambiguous state
  • Idempotent - Behavior consistent across all database states
  • ETHOS-aligned - "No Marketing Fluff", "Simplicity"
  • Self-documenting - Code clearly shows platform-specific architecture
  • Future-proof - No technical debt for next developer

Cons of Option B

  • More work - 14 files vs 2 files
  • Higher risk - More touch points, more chance of breaking something
  • Breaking change - Old configs with updates may fail
  • Alpha users impacted - May require agent re-registration

Design Intent Research

What Commit 9b72662 Actually Did

From git show 9b72662:

  • Created migration 024 with deprecated column approach
  • Updated scheduler to remove updates from getDefaultInterval()
  • Removed updates from CreateDefaultSubsystems()
  • DID NOT remove from: migration 015, agent config, agent main

Conclusion: The author did a partial removal, focusing on server-side scheduling only. They likely planned to remove agent-side references later but didn't finish.

Why Partial?

Alpha software timeline:

  • Dec 20: Commit d255f91 removes scan_updates references from frontend
  • Dec 22: Migration 024 created to "disable" updates subsystem
  • Dec 23: Agent re-registration issues discovered (our current session)

The pace suggests iterative development - fix the immediate blocker (404 errors), clean up later.

ETHOS & README Alignment

ETHOS Principles (from /home/casey/Projects/RedFlag/docs/1_ETHOS/ETHOS.md):

  1. "Errors are History, Not /dev/null" - Track what changed, but don't preserve broken code
  2. "No Marketing Fluff" - Don't add features/code that serve no purpose
  3. "Assume Failure; Build for Resilience" - Clean state, not ambiguous state
  4. "Idempotency is a Requirement" - Consistent behavior

README Claims:

  • "Breaking changes may happen between versions"
  • "Full Reinstall (Nuclear Option)" as cleanup strategy
  • Alpha software, users can handle breaking changes

Analysis:

  • Option B (complete removal) better aligns with ETHOS
  • Option A (partial fix) creates technical debt that violates "No Fluff"
  • README accepts breaking changes, so Option B is acceptable

Recommendation

Rationale:

  1. ETHOS-aligned: Clean, honest code without ambiguity
  2. Alpha-appropriate: Users can reinstall
  3. Self-documenting: Clear that platform-specific is the architecture
  4. Prevents future bugs: No chance of scheduler creating scan_updates commands
  5. Original intent: Author appears to have planned this but didn't finish

Arguments Against Option A:

  1. Technical debt: 12 files with dead code
  2. Inconsistent state: Some places disable, some reference
  3. Future confusion: Developer sees Updates field, thinks it's used
  4. Potential bugs: Could add updates validation somewhere and break
  5. Not future-proof: Eventually someone has to clean this up

Risk Mitigation for Option B:

  1. Test migration on backup first
  2. Document in release notes: "Breaking change - agents may need re-registration"
  3. Provide manual SQL for users who hit issues: "Run: DELETE FROM agent_subsystems WHERE subsystem='updates'"
  4. Add validation with clear error messages

Implementation Checklist

Phase 1: Database Migrations

  • Edit 015_agent_subsystems.up.sql - Remove INSERT for 'updates'
  • Edit 015_agent_subsystems.up.sql - Remove 'updates' from trigger
  • Edit 022_add_subsystem_to_logs.up.sql - Remove 'updates' from constraint
  • Rewrite 024_disable_updates_subsystem.up.sql - DELETE instead of UPDATE
  • Rewrite 024_disable_updates_subsystem.down.sql - INSERT for rollback
  • Edit 025_platform_scanner_subsystems.down.sql - Remove 'updates' from trigger

Phase 2: Agent Config

  • Edit aggregator-agent/internal/config/subsystems.go - Remove Updates field
  • Edit aggregator-agent/internal/config/subsystems.go - Remove from defaults
  • Edit aggregator-agent/internal/config/config.go - Remove Updates migration

Phase 3: Agent Code

  • Edit aggregator-agent/cmd/agent/main.go - Remove updates case

Phase 4: Server Code

  • Edit aggregator-server/internal/command/validator.go - Remove from validActions
  • Edit aggregator-server/internal/api/handlers/subsystems.go - Add validation

Phase 5: Frontend (Optional)

  • Edit aggregator-web/src/types/index.ts - Remove from types if needed
  • Remove references in command-naming comments

Phase 6: Testing

  • Test fresh install: Empty DB, run migrations, verify no updates rows
  • Test existing DB: With updates rows, run migrations, verify deleted
  • Test agent re-registration
  • Test API validation (try to trigger updates, should fail)

Phase 7: Documentation

  • Update ChristmasTodos.md
  • Update README.md if it mentions updates subsystem
  • Create migration notes for alpha users

Conclusion

The migration 024 fix requires a decision between:

  1. Quick fix (Option A) - 2 files, removes blocker, leaves technical debt
  2. Proper fix (Option B) - 14 files, complete removal, ETHOS-aligned

The agent recommends Option B despite the larger scope because:

  • It aligns with ETHOS principles
  • Alpha software can handle breaking changes
  • Prevents future confusion and bugs
  • The original author appears to have intended this but didn't finish

Next Step: Proceed with implementation or discuss scope reduction.