Add docs and project files - force for Culurien
This commit is contained in:
451
migration-024-fix-plan.md
Normal file
451
migration-024-fix-plan.md
Normal file
@@ -0,0 +1,451 @@
|
||||
# Migration 024 Fix - Implementation Plan
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Migration 024 (`024_disable_updates_subsystem.up.sql`) is broken because it references a `deprecated` column that doesn't exist in the `agent_subsystems` table. This prevents migration 025 from running.
|
||||
|
||||
**Root Cause**: The migration adds `ALTER TABLE ADD COLUMN deprecated` but the column doesn't exist in the schema when migration 015 created the table.
|
||||
|
||||
## Current State Analysis
|
||||
|
||||
### What's Actually Broken
|
||||
|
||||
1. Migration 024 adds the `deprecated` column via `ALTER TABLE ADD COLUMN IF NOT EXISTS`
|
||||
2. Then tries to `UPDATE ... SET deprecated = true`
|
||||
3. **BUT**: If this is a fresh database, migration 015 already ran before 024, and the column doesn't exist when migration 024's UPDATE runs
|
||||
4. **Error**: `pq: column "deprecated" does not exist`
|
||||
5. **Result**: Migration fails, migration 025 never runs
|
||||
|
||||
### What's Working
|
||||
|
||||
- Migration 024 DOES have the correct `ALTER TABLE` statement to add the column
|
||||
- The column DOES work if you run migration 024 standalone (like with `docker exec`)
|
||||
- The problem is sequential execution in a transaction
|
||||
|
||||
### Confusion Point: Why Does The Column Exist?
|
||||
|
||||
The `deprecated` column DOES exist in some databases because migrations are run in a specific order:
|
||||
- On existing databases: migrations 001-023 already applied, then 024 tries to run
|
||||
- On fresh databases: migration 015 creates table, then 024 adds column, but there's a transaction boundary issue
|
||||
|
||||
## Option A: Minimal Fix (2 files, ~3 lines)
|
||||
|
||||
**Goal**: Just make migration 024 not error, accept that `updates` subsystem may still exist
|
||||
|
||||
### Changes Needed
|
||||
|
||||
#### File 1: `024_disable_updates_subsystem.up.sql`
|
||||
```sql
|
||||
-- Fix: Remove the UPDATE that references deprecated column
|
||||
-- Keep only: ALTER TABLE ADD COLUMN (if we want it for historical tracking)
|
||||
ALTER TABLE agent_subsystems ADD COLUMN IF NOT EXISTS deprecated BOOLEAN DEFAULT false;
|
||||
|
||||
-- Remove the UPDATE entirely - it serves no functional purpose
|
||||
-- No one ever reads the deprecated column anywhere
|
||||
|
||||
-- Log migration completion
|
||||
INSERT INTO schema_migrations (version) VALUES ('024_disable_updates_subsystem.up.sql');
|
||||
```
|
||||
|
||||
#### File 2: `internal/command/validator.go` (line 74)
|
||||
```go
|
||||
// Remove "updates" from the validActions slice if it's still there
|
||||
var validActions = []string{
|
||||
"storage",
|
||||
"system",
|
||||
"docker",
|
||||
"apt",
|
||||
"dnf",
|
||||
"windows",
|
||||
"winget",
|
||||
// "updates", // REMOVE THIS
|
||||
}
|
||||
```
|
||||
|
||||
### Pros of Option A
|
||||
- ✅ Minimal changes (2 files)
|
||||
- ✅ Fixes the immediate blocker (migration 024 runs, migration 025 runs)
|
||||
- ✅ Quick to implement
|
||||
- ✅ Low risk of breaking something else
|
||||
|
||||
### Cons of Option A
|
||||
- ❌ Leaves 12+ files with outdated `updates` references
|
||||
- ❌ Ambiguous state: `updates` rows may exist in some databases
|
||||
- ❌ Technical debt: Future developers confused about `updates` purpose
|
||||
- ❌ Not idempotent: Behavior varies based on database state
|
||||
- ❌ Violates ETHOS "No Marketing Fluff" - we have dead code
|
||||
|
||||
## Option B: Complete Removal (14 files, ~50 lines)
|
||||
|
||||
**Goal**: Remove ALL references to `updates` subsystem from codebase
|
||||
|
||||
### Rationale
|
||||
|
||||
From ETHOS.md:
|
||||
- **"Assume Failure; Build for Resilience"** - Clean deletion preferred over complexity
|
||||
- **"No Marketing Fluff"** - No code that serves no purpose
|
||||
- **"Idempotency is a Requirement"** - System behavior should be consistent
|
||||
|
||||
From README.md:
|
||||
- **"Breaking changes may happen between versions"** - Alpha software
|
||||
- **"Full Reinstall (Nuclear Option)"** - Users can reinstall if needed
|
||||
|
||||
The project has first-iteration alpha users who can reinstall or manually migrate. The codebase should be clean.
|
||||
|
||||
### Changes Needed (Complete List)
|
||||
|
||||
#### Database Migrations (6 files)
|
||||
|
||||
**File 1: `015_agent_subsystems.up.sql`**
|
||||
```sql
|
||||
-- Remove INSERT for 'updates' subsystem (lines 33-36)
|
||||
-- FROM:
|
||||
INSERT INTO agent_subsystems (agent_id, subsystem, enabled, interval_minutes, auto_run)
|
||||
SELECT id, 'updates', true, 15, false FROM agents
|
||||
WHERE NOT EXISTS (
|
||||
SELECT 1 FROM agent_subsystems WHERE agent_subsystems.agent_id = agents.id AND subsystem = 'updates'
|
||||
)
|
||||
UNION ALL
|
||||
|
||||
-- TO: DELETE those lines (only keep storage, system, docker INSERTS)
|
||||
```
|
||||
|
||||
**File 2: `015_agent_subsystems.up.sql` - Trigger**
|
||||
```sql
|
||||
-- Remove from trigger (line 60)
|
||||
-- FROM:
|
||||
INSERT INTO agent_subsystems (agent_id, subsystem, enabled, interval_minutes, auto_run)
|
||||
VALUES
|
||||
(NEW.id, 'storage', true, 5, true),
|
||||
(NEW.id, 'system', true, 5, true),
|
||||
(NEW.id, 'docker', false, 15, false),
|
||||
(NEW.id, 'updates', true, 15, false); -- REMOVE THIS LINE
|
||||
|
||||
-- TO: Only storage, system, docker
|
||||
```
|
||||
|
||||
**File 3: `022_add_subsystem_to_logs.up.sql` - Constraint**
|
||||
```sql
|
||||
-- Update CHECK constraint (around line 30)
|
||||
-- FROM:
|
||||
CHECK (subsystem IN ('docker', 'storage', 'system', 'apt', 'dnf', 'winget', 'updates', ...))
|
||||
|
||||
-- TO: Remove 'updates' from constraint
|
||||
CHECK (subsystem IN ('docker', 'storage', 'system', 'apt', 'dnf', 'winget', ...))
|
||||
```
|
||||
|
||||
**File 4: `024_disable_updates_subsystem.up.sql`**
|
||||
```sql
|
||||
-- Complete rewrite
|
||||
-- FROM: Disable with deprecated flag
|
||||
-- TO: Delete entirely
|
||||
|
||||
-- Migration: Remove legacy updates subsystem
|
||||
-- Purpose: Delete monolithic updates subsystem (replaced by apt/dnf/winget/windows)
|
||||
-- Version: 0.1.29
|
||||
-- Date: 2025-12-23
|
||||
|
||||
-- Remove all 'updates' subsystems
|
||||
DELETE FROM agent_subsystems WHERE subsystem = 'updates';
|
||||
|
||||
-- Log migration completion
|
||||
INSERT INTO schema_migrations (version) VALUES ('024_remove_updates_subsystem.up.sql');
|
||||
```
|
||||
|
||||
**File 5: `024_disable_updates_subsystem.down.sql`**
|
||||
```sql
|
||||
-- Complete rewrite
|
||||
-- FROM: Re-enable and drop deprecated column
|
||||
-- TO: Re-insert updates (for rollback only)
|
||||
|
||||
-- Rollback: Re-add legacy updates subsystem
|
||||
INSERT INTO agent_subsystems (agent_id, subsystem, enabled, interval_minutes, auto_run, created_at, updated_at)
|
||||
SELECT id, 'updates', false, 15, false, NOW(), NOW()
|
||||
FROM agents
|
||||
WHERE NOT EXISTS (
|
||||
SELECT 1 FROM agent_subsystems WHERE agent_id = agents.id AND subsystem = 'updates'
|
||||
);
|
||||
|
||||
RAISE WARNING 'Re-added legacy updates subsystem - may conflict with platform-specific scanners';
|
||||
```
|
||||
|
||||
**File 6: `025_platform_scanner_subsystems.down.sql` - Trigger**
|
||||
```sql
|
||||
-- Remove 'updates' from rollback trigger (line 19)
|
||||
-- FROM:
|
||||
INSERT INTO agent_subsystems (agent_id, subsystem, enabled, interval_minutes, auto_run)
|
||||
VALUES
|
||||
(NEW.id, 'storage', true, 5, true),
|
||||
(NEW.id, 'system', true, 5, true),
|
||||
(NEW.id, 'updates', true, 15, false); -- REMOVE THIS LINE
|
||||
|
||||
-- TO: Don't add 'updates' in rollback
|
||||
```
|
||||
|
||||
#### Agent Config Files (4 files)
|
||||
|
||||
**File 7: `aggregator-agent/internal/config/subsystems.go`**
|
||||
```go
|
||||
// Remove Updates field from SubsystemsConfig struct
|
||||
// Line ~40:
|
||||
type SubsystemsConfig struct {
|
||||
System SubsystemConfig `json:"system"`
|
||||
// Updates SubsystemConfig `json:"updates"` // REMOVE THIS LINE
|
||||
Docker SubsystemConfig `json:"docker"`
|
||||
Storage SubsystemConfig `json:"storage"`
|
||||
APT SubsystemConfig `json:"apt"`
|
||||
DNF SubsystemConfig `json:"dnf"`
|
||||
}
|
||||
```
|
||||
|
||||
```go
|
||||
// Remove from GetDefaultSubsystemsConfig()
|
||||
// Lines ~76-81:
|
||||
func GetDefaultSubsystemsConfig() SubsystemsConfig {
|
||||
return SubsystemsConfig{
|
||||
System: GetDefaultSystemConfig(),
|
||||
// REMOVED: Updates: SubsystemConfig{...}
|
||||
Docker: SubsystemConfig{
|
||||
Enabled: false,
|
||||
Timeout: 0,
|
||||
IntervalMinutes: 120,
|
||||
CircuitBreaker: CircuitBreakerConfig{Enabled: false},
|
||||
},
|
||||
...
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**File 8: `aggregator-agent/internal/config/config.go`**
|
||||
```go
|
||||
// Remove Updates migration from migrateConfig()
|
||||
// Lines ~338-341:
|
||||
func migrateConfig(cfg *Config) {
|
||||
if cfg.Subsystems.Updates == (SubsystemConfig{}) {
|
||||
fmt.Printf("[CONFIG] Adding missing 'updates' subsystem configuration\n")
|
||||
cfg.Subsystems.Updates = GetDefaultSubsystemsConfig().Updates
|
||||
}
|
||||
// Remove the above block entirely
|
||||
}
|
||||
```
|
||||
|
||||
**File 9: `aggregator-agent/internal/migration/detection.go`**
|
||||
```go
|
||||
// Check if version detection includes updates
|
||||
// Remove or update version detection that checks for updates_subsystem
|
||||
// Around line 50-70
|
||||
```
|
||||
|
||||
**File 10: `aggregator-agent/internal/migration/executor.go`**
|
||||
```go
|
||||
// Check for any migration logic that adds updates
|
||||
// Remove or update accordingly
|
||||
```
|
||||
|
||||
#### Server Code (2 files)
|
||||
|
||||
**File 11: `aggregator-server/internal/command/validator.go`**
|
||||
```go
|
||||
// Line ~74: Remove "updates" from validActions
|
||||
var validActions = []string{
|
||||
"storage",
|
||||
"system",
|
||||
"docker",
|
||||
"apt",
|
||||
"dnf",
|
||||
"windows",
|
||||
"winget",
|
||||
// "updates", // REMOVE THIS
|
||||
}
|
||||
```
|
||||
|
||||
**File 12: `aggregator-server/internal/api/handlers/subsystems.go`**
|
||||
```go
|
||||
// Add validation to TriggerSubsystem
|
||||
func (h *SubsystemHandler) TriggerSubsystem(c *gin.Context) {
|
||||
subsystem := c.Param("subsystem")
|
||||
|
||||
// ADD VALIDATION
|
||||
validSubsystems := []string{"storage", "system", "docker", "apt", "dnf", "windows", "winget"}
|
||||
subsystemValid := false
|
||||
for _, valid := range validSubsystems {
|
||||
if subsystem == valid {
|
||||
subsystemValid = true
|
||||
break
|
||||
}
|
||||
}
|
||||
|
||||
if !subsystemValid {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": fmt.Sprintf("Invalid subsystem: %s", subsystem)})
|
||||
return
|
||||
}
|
||||
|
||||
// ... rest of function
|
||||
}
|
||||
```
|
||||
|
||||
#### Agent Main (1 file)
|
||||
|
||||
**File 13: `aggregator-agent/cmd/agent/main.go`**
|
||||
```go
|
||||
// Remove case for "updates" in getCurrentSubsystemEnabled
|
||||
// Lines ~518-520:
|
||||
func getCurrentSubsystemEnabled(cfg *config.Config, subsystemName string) bool {
|
||||
switch subsystemName {
|
||||
case "system":
|
||||
return cfg.Subsystems.System.Enabled
|
||||
// case "updates": // REMOVE
|
||||
// return cfg.Subsystems.Updates.Enabled // REMOVE
|
||||
case "docker":
|
||||
return cfg.Subsystems.Docker.Enabled
|
||||
...
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Frontend (2 files)
|
||||
|
||||
**File 14: `aggregator-web/src/types/index.ts`**
|
||||
```typescript
|
||||
// Remove 'updates' from Subsystem type if it exists
|
||||
export type Subsystem = 'storage' | 'system' | 'docker' | 'apt' | 'dnf' | 'windows' | 'winget'
|
||||
// REMOVED: | 'updates'
|
||||
```
|
||||
|
||||
**File 15: `aggregator-web/src/lib/command-naming.ts`**
|
||||
```typescript
|
||||
// Clean up old comments referencing scan_updates (cosmetic)
|
||||
// Lines with comments mentioning scan_updates
|
||||
```
|
||||
|
||||
### Pros of Option B
|
||||
- ✅ **Clean codebase** - No dead code or ambiguous state
|
||||
- ✅ **Idempotent** - Behavior consistent across all database states
|
||||
- ✅ **ETHOS-aligned** - "No Marketing Fluff", "Simplicity"
|
||||
- ✅ **Self-documenting** - Code clearly shows platform-specific architecture
|
||||
- ✅ **Future-proof** - No technical debt for next developer
|
||||
|
||||
### Cons of Option B
|
||||
- ❌ **More work** - 14 files vs 2 files
|
||||
- ❌ **Higher risk** - More touch points, more chance of breaking something
|
||||
- ❌ **Breaking change** - Old configs with `updates` may fail
|
||||
- ❌ **Alpha users impacted** - May require agent re-registration
|
||||
|
||||
## Design Intent Research
|
||||
|
||||
### What Commit 9b72662 Actually Did
|
||||
|
||||
From git show 9b72662:
|
||||
- Created migration 024 with `deprecated` column approach
|
||||
- Updated scheduler to remove `updates` from `getDefaultInterval()`
|
||||
- Removed `updates` from `CreateDefaultSubsystems()`
|
||||
- DID NOT remove from: migration 015, agent config, agent main
|
||||
|
||||
**Conclusion**: The author did a **partial removal**, focusing on server-side scheduling only. They likely planned to remove agent-side references later but didn't finish.
|
||||
|
||||
### Why Partial?
|
||||
|
||||
Alpha software timeline:
|
||||
- Dec 20: Commit d255f91 removes scan_updates references from frontend
|
||||
- Dec 22: Migration 024 created to "disable" updates subsystem
|
||||
- Dec 23: Agent re-registration issues discovered (our current session)
|
||||
|
||||
The pace suggests **iterative development** - fix the immediate blocker (404 errors), clean up later.
|
||||
|
||||
### ETHOS & README Alignment
|
||||
|
||||
**ETHOS Principles** (from `/home/casey/Projects/RedFlag/docs/1_ETHOS/ETHOS.md`):
|
||||
|
||||
1. **"Errors are History, Not /dev/null"** - Track what changed, but don't preserve broken code
|
||||
2. **"No Marketing Fluff"** - Don't add features/code that serve no purpose
|
||||
3. **"Assume Failure; Build for Resilience"** - Clean state, not ambiguous state
|
||||
4. **"Idempotency is a Requirement"** - Consistent behavior
|
||||
|
||||
**README Claims**:
|
||||
- "Breaking changes may happen between versions"
|
||||
- "Full Reinstall (Nuclear Option)" as cleanup strategy
|
||||
- Alpha software, users can handle breaking changes
|
||||
|
||||
**Analysis**:
|
||||
- Option B (complete removal) better aligns with ETHOS
|
||||
- Option A (partial fix) creates technical debt that violates "No Fluff"
|
||||
- README accepts breaking changes, so Option B is acceptable
|
||||
|
||||
## Recommendation
|
||||
|
||||
### Recommended: Option B (Complete Removal)
|
||||
|
||||
**Rationale**:
|
||||
|
||||
1. **ETHOS-aligned**: Clean, honest code without ambiguity
|
||||
2. **Alpha-appropriate**: Users can reinstall
|
||||
3. **Self-documenting**: Clear that platform-specific is the architecture
|
||||
4. **Prevents future bugs**: No chance of scheduler creating `scan_updates` commands
|
||||
5. **Original intent**: Author appears to have planned this but didn't finish
|
||||
|
||||
**Arguments Against Option A**:
|
||||
|
||||
1. **Technical debt**: 12 files with dead code
|
||||
2. **Inconsistent state**: Some places disable, some reference
|
||||
3. **Future confusion**: Developer sees `Updates` field, thinks it's used
|
||||
4. **Potential bugs**: Could add `updates` validation somewhere and break
|
||||
5. **Not future-proof**: Eventually someone has to clean this up
|
||||
|
||||
**Risk Mitigation for Option B**:
|
||||
|
||||
1. Test migration on backup first
|
||||
2. Document in release notes: "Breaking change - agents may need re-registration"
|
||||
3. Provide manual SQL for users who hit issues: "Run: DELETE FROM agent_subsystems WHERE subsystem='updates'"
|
||||
4. Add validation with clear error messages
|
||||
|
||||
## Implementation Checklist
|
||||
|
||||
### Phase 1: Database Migrations
|
||||
- [ ] Edit `015_agent_subsystems.up.sql` - Remove INSERT for 'updates'
|
||||
- [ ] Edit `015_agent_subsystems.up.sql` - Remove 'updates' from trigger
|
||||
- [ ] Edit `022_add_subsystem_to_logs.up.sql` - Remove 'updates' from constraint
|
||||
- [ ] Rewrite `024_disable_updates_subsystem.up.sql` - DELETE instead of UPDATE
|
||||
- [ ] Rewrite `024_disable_updates_subsystem.down.sql` - INSERT for rollback
|
||||
- [ ] Edit `025_platform_scanner_subsystems.down.sql` - Remove 'updates' from trigger
|
||||
|
||||
### Phase 2: Agent Config
|
||||
- [ ] Edit `aggregator-agent/internal/config/subsystems.go` - Remove Updates field
|
||||
- [ ] Edit `aggregator-agent/internal/config/subsystems.go` - Remove from defaults
|
||||
- [ ] Edit `aggregator-agent/internal/config/config.go` - Remove Updates migration
|
||||
|
||||
### Phase 3: Agent Code
|
||||
- [ ] Edit `aggregator-agent/cmd/agent/main.go` - Remove updates case
|
||||
|
||||
### Phase 4: Server Code
|
||||
- [ ] Edit `aggregator-server/internal/command/validator.go` - Remove from validActions
|
||||
- [ ] Edit `aggregator-server/internal/api/handlers/subsystems.go` - Add validation
|
||||
|
||||
### Phase 5: Frontend (Optional)
|
||||
- [ ] Edit `aggregator-web/src/types/index.ts` - Remove from types if needed
|
||||
- [ ] Remove references in command-naming comments
|
||||
|
||||
### Phase 6: Testing
|
||||
- [ ] Test fresh install: Empty DB, run migrations, verify no updates rows
|
||||
- [ ] Test existing DB: With updates rows, run migrations, verify deleted
|
||||
- [ ] Test agent re-registration
|
||||
- [ ] Test API validation (try to trigger updates, should fail)
|
||||
|
||||
### Phase 7: Documentation
|
||||
- [ ] Update ChristmasTodos.md
|
||||
- [ ] Update README.md if it mentions updates subsystem
|
||||
- [ ] Create migration notes for alpha users
|
||||
|
||||
## Conclusion
|
||||
|
||||
The migration 024 fix requires a decision between:
|
||||
|
||||
1. **Quick fix** (Option A) - 2 files, removes blocker, leaves technical debt
|
||||
2. **Proper fix** (Option B) - 14 files, complete removal, ETHOS-aligned
|
||||
|
||||
The agent recommends **Option B** despite the larger scope because:
|
||||
- It aligns with ETHOS principles
|
||||
- Alpha software can handle breaking changes
|
||||
- Prevents future confusion and bugs
|
||||
- The original author appears to have intended this but didn't finish
|
||||
|
||||
**Next Step**: Proceed with implementation or discuss scope reduction.
|
||||
Reference in New Issue
Block a user