Files
Redflag/v1.0-STABLE-ROADMAP.md

270 lines
8.4 KiB
Markdown

# v1.0-STABLE Roadmap: Unified Release Strategy (Option 3)
**Strategy:** Take 2-4 weeks to fix all P0 issues, then release unified v1.0
**Timeline:** ~10 weeks of focused development
**Decision:** Option 3 (from Senate Deliberation)
---
## PROJECT CONTEXT
**Current State:**
- Nobody has current version in production (all in development)
- Legacy v0.1.18 has 12 stable users
- Current has been in active development since legacy
- All users would benefit from unified v1.0
**Why This Makes Sense:**
- No existing production users to break (Option 3's main risk eliminated)
- Can focus entirely on fixing P0 issues without migration pressure
- Legacy users can upgrade directly to v1.0-stable when ready
- Clean slate for marketing/branding
---
## WEEK 1: CRITICAL FOUNDATION (P0 Blockers)
### Day 1-2: Database & Migration System
**P0-001: Database Transaction Poisoning**
- [ ] Fix transaction logic in `aggregator-server/internal/database/db.go:93-116`
- [ ] Remove duplicate INSERT after rollback
- [ ] Add transaction safety checks (verify schema matches migrations table)
- [ ] Test on fresh database (5 consecutive successful installs)
**P0-007: Migration Backup/Restore**
- [ ] Implement pre-migration backup hook
- [ ] Create automated backup mechanism
- [ ] Document backup/restore procedures
### Day 3-4: Circuit Breaker & Rate Limiting
**P0-003: Rate Limiting Death Spiral**
- [ ] Replace simple rate limiter with token bucket algorithm
- [ ] Implement exponential backoff with jitter
- [ ] Create event queuing with priority levels
- [ ] Test recovery from offline buffer scenarios
**P0-004: Circuit Breaker Failures**
- [ ] Add auto-recovery mechanism to circuit breakers
- [ ] Create monitoring for circuit breaker states
- [ ] Add alerting when circuits stay open >5 minutes
- [ ] Test network recovery scenarios
### Day 5: Security Foundations
**P0-005: Ed25519 Signing**
- [ ] Connect Build Orchestrator to signing service
- [ ] Implement binary signature verification on agent install
- [ ] Test signature verification end-to-end
**P0-009: Key Management**
- [ ] Design key rotation mechanism
- [ ] Document key management procedures
- [ ] Plan HSM integration (AWS KMS, Azure Key Vault)
---
## WEEK 2: INFRASTRUCTURE & MONITORING
### Day 6-8: Hardware & Logging
**P0-002: Hardware Binding Ransom**
- [ ] Create API endpoint for hardware re-binding
- [ ] Add hardware change detection
- [ ] Test migration path for hardware changes
**P0-008: Log Weaponization**
- [ ] Implement log sanitization (strip ANSI, validate JSON, enforce size limits)
- [ ] Separate audit logs from operational logs
- [ ] Add log injection attack prevention
- [ ] Access control on log viewing
### Day 9-10: Health & Monitoring
**P0-010: Health Endpoint**
- [ ] Create separate health endpoint (not check-in cycle)
- [ ] Add health status API
- [ ] Create monitoring dashboard integration
**Migration Testing Framework**
- [ ] Build automated migration testing for fresh installs
- [ ] Build automated migration testing for upgrades (v0.1.18 → v1.0)
- [ ] Create rollback verification
---
## WEEK 3: CODE QUALITY & REFACTORING
### Day 11-13: Monolithic Refactoring
**P0-006: Monolithic runAgent**
- [ ] Break down `runAgent` from 1,307 lines to modular components
- [ ] Create separate packages for each subsystem
- [ ] Target: No function >500 lines
- [ ] Add proper interfaces between components
### Day 14-15: TypeScript & Frontend
**P0-012: TypeScript Build Errors**
- [ ] Fix ~100 TypeScript build errors
- [ ] Create production build pipeline
- [ ] Verify build works on clean environment
---
## WEEK 4: SECURITY HARDENING
### Day 16-18: Security Infrastructure
**P0-005 Continued: Ed25519 & Key Management**
- [ ] Implement key rotation mechanism
- [ ] Create emergency rotation playbook
- [ ] Add HSM support (AWS KMS, Azure Key Vault integration)
**Security Observability**
- [ ] Create security status endpoints
- [ ] Add security metrics collection
- [ ] Create security hardening guide
### Day 19-20: Dependency & Supply Chain
**P0-013: Dependency Vulnerabilities**
- [ ] Run `npm audit` and `go mod audit`
- [ ] Create monthly dependency update schedule
- [ ] Implement automated security scanning in CI/CD
- [ ] Fork and maintain `windowsupdate` if upstream abandoned
---
## WEEK 5: INTEGRATION & TESTING
### Day 21-23: System Integration
- [ ] Integrate all P0 fixes
- [ ] Test end-to-end workflows
- [ ] Performance testing
- [ ] Stress testing (simulate high load scenarios)
### Day 24-25: Documentation
- [ ] Create user-facing installation guide
- [ ] Document upgrade path from legacy v0.1.18
- [ ] Create troubleshooting guide
- [ ] API documentation updates
---
## WEEK 6-7: COMPREHENSIVE TESTING
### Week 6: Testing Coverage
- [ ] Achieve 90% test coverage
- [ ] Test all error scenarios (not just happy path)
- [ ] Chaos engineering tests (simulate failures)
- [ ] Migration testing (fresh install + upgrade paths)
### Week 7: User Acceptance Testing
- [ ] Internal testing with non-dev users
- [ ] Security review (respects established security stack)
- [ ] Performance validation
- [ ] Documentation review
---
## WEEK 8: PRE-RELEASE PREPARATION
### Version Tagging & Release
- [ ] Create version tag: `v1.0-stable`
- [ ] Update README.md (remove "alpha" warnings)
- [ ] Create CHANGELOG.md with v1.0 features
- [ ] Verify all quality checkpoints from ETHOS.md
---
## POST-RELEASE (Week 9-10)
### Migration Support
- [ ] Create migration documentation (legacy → v1.0)
- [ ] Test migration from actual legacy deployments
- [ ] Offer migration assistance to legacy users
- [ ] Document rollback procedures
### Monitoring & Support
- [ ] Set up production monitoring
- [ ] Create incident response procedures
- [ ] Set up support channels
- [ ] Create FAQ based on early user questions
---
## QUALITY CHECKPOINTS (Per ETHOS.md)
**Week 8 Verification:**
### Pre-v1.0 Release Checklist:
- [ ] All errors are logged (not silenced with /dev/null)
- [ ] No new unauthenticated endpoints (all use proper middleware)
- [ ] Backup/restore/fallback paths exist for all critical operations
- [ ] Idempotency verified (can run 3x safely)
- [ ] History table logging added for all state changes
- [ ] Security review completed (respects established stack)
- [ ] Testing includes error scenarios (not just happy path)
- [ ] Documentation is updated with current implementation details
- [ ] Technical debt is identified and tracked
- [ ] 90% test coverage achieved
- [ ] Zero P0 violations (Claudia confirmation required)
---
## SUCCESS CRITERIA
### v1.0-STABLE is READY when:
1. ✅ All 8 P0 issues resolved and tested
2. ✅ Migration testing framework operational
3. ✅ Health monitoring and circuit breakers working
4. ✅ TypeScript builds cleanly (no errors)
5. ✅ Security review passes with zero critical findings
6. ✅ 90% test coverage achieved
7. ✅ Documentation accurate and complete
8. ✅ Claudia's review confirms zero P0s
9. ✅ Lilith's review confirms no hidden landmines
10. ✅ Irulan's architecture review passes
---
## BENEFITS OF THIS APPROACH
### For RedFlag Project:
- **Honest Foundation:** Start with stable, production-ready base
- **No Migration Burden:** First release is clean slate
- **ETHOS Compliant:** Built on principles from ground up
- **Sustainable:** Proper architecture prevents technical debt accumulation
### For Legacy Users:
- **Clear Upgrade Path:** When ready, migration is documented and tested
- **Zero Pressure:** Can stay on legacy as long as needed (12 month LTS)
- **Feature Benefits:** All new features available in unified v1.0
### For New Users:
- **Production Ready:** Can deploy with confidence
- **Stable Foundation:** No critical blockers
- **Honest Status:** Clear documentation of capabilities and limits
---
## RISK MITIGATION
### Primary Risk: Timeline Slippage
**Mitigation:**
- Daily progress tracking via TodoWrite
- Weekly mini-reviews to catch issues early
- Flexibility to extend individual weeks (but not total scope)
### Secondary Risk: Discovering More Issues
**Mitigation:**
- Lilith's periodic review during Weeks 5-6
- Focus on P0s only (no feature creep)
- Accept that future releases will improve further
### Tertiary Risk: User Impatience
**Mitigation:**
- Clear communication about timeline
- Honest status updates
- Document progress publicly (Codeberg issues)
---
**Roadmap Created:** January 22, 2026
**Target Release:** Week 8 (mid-March 2026)
**Tracking:** TodoWrite integration for daily progress
**Status:** READY TO BEGIN