270 lines
8.4 KiB
Markdown
270 lines
8.4 KiB
Markdown
# v1.0-STABLE Roadmap: Unified Release Strategy (Option 3)
|
|
|
|
**Strategy:** Take 2-4 weeks to fix all P0 issues, then release unified v1.0
|
|
**Timeline:** ~10 weeks of focused development
|
|
**Decision:** Option 3 (from Senate Deliberation)
|
|
|
|
---
|
|
|
|
## PROJECT CONTEXT
|
|
|
|
**Current State:**
|
|
- Nobody has current version in production (all in development)
|
|
- Legacy v0.1.18 has 12 stable users
|
|
- Current has been in active development since legacy
|
|
- All users would benefit from unified v1.0
|
|
|
|
**Why This Makes Sense:**
|
|
- No existing production users to break (Option 3's main risk eliminated)
|
|
- Can focus entirely on fixing P0 issues without migration pressure
|
|
- Legacy users can upgrade directly to v1.0-stable when ready
|
|
- Clean slate for marketing/branding
|
|
|
|
---
|
|
|
|
## WEEK 1: CRITICAL FOUNDATION (P0 Blockers)
|
|
|
|
### Day 1-2: Database & Migration System
|
|
**P0-001: Database Transaction Poisoning**
|
|
- [ ] Fix transaction logic in `aggregator-server/internal/database/db.go:93-116`
|
|
- [ ] Remove duplicate INSERT after rollback
|
|
- [ ] Add transaction safety checks (verify schema matches migrations table)
|
|
- [ ] Test on fresh database (5 consecutive successful installs)
|
|
|
|
**P0-007: Migration Backup/Restore**
|
|
- [ ] Implement pre-migration backup hook
|
|
- [ ] Create automated backup mechanism
|
|
- [ ] Document backup/restore procedures
|
|
|
|
### Day 3-4: Circuit Breaker & Rate Limiting
|
|
**P0-003: Rate Limiting Death Spiral**
|
|
- [ ] Replace simple rate limiter with token bucket algorithm
|
|
- [ ] Implement exponential backoff with jitter
|
|
- [ ] Create event queuing with priority levels
|
|
- [ ] Test recovery from offline buffer scenarios
|
|
|
|
**P0-004: Circuit Breaker Failures**
|
|
- [ ] Add auto-recovery mechanism to circuit breakers
|
|
- [ ] Create monitoring for circuit breaker states
|
|
- [ ] Add alerting when circuits stay open >5 minutes
|
|
- [ ] Test network recovery scenarios
|
|
|
|
### Day 5: Security Foundations
|
|
**P0-005: Ed25519 Signing**
|
|
- [ ] Connect Build Orchestrator to signing service
|
|
- [ ] Implement binary signature verification on agent install
|
|
- [ ] Test signature verification end-to-end
|
|
|
|
**P0-009: Key Management**
|
|
- [ ] Design key rotation mechanism
|
|
- [ ] Document key management procedures
|
|
- [ ] Plan HSM integration (AWS KMS, Azure Key Vault)
|
|
|
|
---
|
|
|
|
## WEEK 2: INFRASTRUCTURE & MONITORING
|
|
|
|
### Day 6-8: Hardware & Logging
|
|
**P0-002: Hardware Binding Ransom**
|
|
- [ ] Create API endpoint for hardware re-binding
|
|
- [ ] Add hardware change detection
|
|
- [ ] Test migration path for hardware changes
|
|
|
|
**P0-008: Log Weaponization**
|
|
- [ ] Implement log sanitization (strip ANSI, validate JSON, enforce size limits)
|
|
- [ ] Separate audit logs from operational logs
|
|
- [ ] Add log injection attack prevention
|
|
- [ ] Access control on log viewing
|
|
|
|
### Day 9-10: Health & Monitoring
|
|
**P0-010: Health Endpoint**
|
|
- [ ] Create separate health endpoint (not check-in cycle)
|
|
- [ ] Add health status API
|
|
- [ ] Create monitoring dashboard integration
|
|
|
|
**Migration Testing Framework**
|
|
- [ ] Build automated migration testing for fresh installs
|
|
- [ ] Build automated migration testing for upgrades (v0.1.18 → v1.0)
|
|
- [ ] Create rollback verification
|
|
|
|
---
|
|
|
|
## WEEK 3: CODE QUALITY & REFACTORING
|
|
|
|
### Day 11-13: Monolithic Refactoring
|
|
**P0-006: Monolithic runAgent**
|
|
- [ ] Break down `runAgent` from 1,307 lines to modular components
|
|
- [ ] Create separate packages for each subsystem
|
|
- [ ] Target: No function >500 lines
|
|
- [ ] Add proper interfaces between components
|
|
|
|
### Day 14-15: TypeScript & Frontend
|
|
**P0-012: TypeScript Build Errors**
|
|
- [ ] Fix ~100 TypeScript build errors
|
|
- [ ] Create production build pipeline
|
|
- [ ] Verify build works on clean environment
|
|
|
|
---
|
|
|
|
## WEEK 4: SECURITY HARDENING
|
|
|
|
### Day 16-18: Security Infrastructure
|
|
**P0-005 Continued: Ed25519 & Key Management**
|
|
- [ ] Implement key rotation mechanism
|
|
- [ ] Create emergency rotation playbook
|
|
- [ ] Add HSM support (AWS KMS, Azure Key Vault integration)
|
|
|
|
**Security Observability**
|
|
- [ ] Create security status endpoints
|
|
- [ ] Add security metrics collection
|
|
- [ ] Create security hardening guide
|
|
|
|
### Day 19-20: Dependency & Supply Chain
|
|
**P0-013: Dependency Vulnerabilities**
|
|
- [ ] Run `npm audit` and `go mod audit`
|
|
- [ ] Create monthly dependency update schedule
|
|
- [ ] Implement automated security scanning in CI/CD
|
|
- [ ] Fork and maintain `windowsupdate` if upstream abandoned
|
|
|
|
---
|
|
|
|
## WEEK 5: INTEGRATION & TESTING
|
|
|
|
### Day 21-23: System Integration
|
|
- [ ] Integrate all P0 fixes
|
|
- [ ] Test end-to-end workflows
|
|
- [ ] Performance testing
|
|
- [ ] Stress testing (simulate high load scenarios)
|
|
|
|
### Day 24-25: Documentation
|
|
- [ ] Create user-facing installation guide
|
|
- [ ] Document upgrade path from legacy v0.1.18
|
|
- [ ] Create troubleshooting guide
|
|
- [ ] API documentation updates
|
|
|
|
---
|
|
|
|
## WEEK 6-7: COMPREHENSIVE TESTING
|
|
|
|
### Week 6: Testing Coverage
|
|
- [ ] Achieve 90% test coverage
|
|
- [ ] Test all error scenarios (not just happy path)
|
|
- [ ] Chaos engineering tests (simulate failures)
|
|
- [ ] Migration testing (fresh install + upgrade paths)
|
|
|
|
### Week 7: User Acceptance Testing
|
|
- [ ] Internal testing with non-dev users
|
|
- [ ] Security review (respects established security stack)
|
|
- [ ] Performance validation
|
|
- [ ] Documentation review
|
|
|
|
---
|
|
|
|
## WEEK 8: PRE-RELEASE PREPARATION
|
|
|
|
### Version Tagging & Release
|
|
- [ ] Create version tag: `v1.0-stable`
|
|
- [ ] Update README.md (remove "alpha" warnings)
|
|
- [ ] Create CHANGELOG.md with v1.0 features
|
|
- [ ] Verify all quality checkpoints from ETHOS.md
|
|
|
|
---
|
|
|
|
## POST-RELEASE (Week 9-10)
|
|
|
|
### Migration Support
|
|
- [ ] Create migration documentation (legacy → v1.0)
|
|
- [ ] Test migration from actual legacy deployments
|
|
- [ ] Offer migration assistance to legacy users
|
|
- [ ] Document rollback procedures
|
|
|
|
### Monitoring & Support
|
|
- [ ] Set up production monitoring
|
|
- [ ] Create incident response procedures
|
|
- [ ] Set up support channels
|
|
- [ ] Create FAQ based on early user questions
|
|
|
|
---
|
|
|
|
## QUALITY CHECKPOINTS (Per ETHOS.md)
|
|
|
|
**Week 8 Verification:**
|
|
|
|
### Pre-v1.0 Release Checklist:
|
|
- [ ] All errors are logged (not silenced with /dev/null)
|
|
- [ ] No new unauthenticated endpoints (all use proper middleware)
|
|
- [ ] Backup/restore/fallback paths exist for all critical operations
|
|
- [ ] Idempotency verified (can run 3x safely)
|
|
- [ ] History table logging added for all state changes
|
|
- [ ] Security review completed (respects established stack)
|
|
- [ ] Testing includes error scenarios (not just happy path)
|
|
- [ ] Documentation is updated with current implementation details
|
|
- [ ] Technical debt is identified and tracked
|
|
- [ ] 90% test coverage achieved
|
|
- [ ] Zero P0 violations (Claudia confirmation required)
|
|
|
|
---
|
|
|
|
## SUCCESS CRITERIA
|
|
|
|
### v1.0-STABLE is READY when:
|
|
1. ✅ All 8 P0 issues resolved and tested
|
|
2. ✅ Migration testing framework operational
|
|
3. ✅ Health monitoring and circuit breakers working
|
|
4. ✅ TypeScript builds cleanly (no errors)
|
|
5. ✅ Security review passes with zero critical findings
|
|
6. ✅ 90% test coverage achieved
|
|
7. ✅ Documentation accurate and complete
|
|
8. ✅ Claudia's review confirms zero P0s
|
|
9. ✅ Lilith's review confirms no hidden landmines
|
|
10. ✅ Irulan's architecture review passes
|
|
|
|
---
|
|
|
|
## BENEFITS OF THIS APPROACH
|
|
|
|
### For RedFlag Project:
|
|
- **Honest Foundation:** Start with stable, production-ready base
|
|
- **No Migration Burden:** First release is clean slate
|
|
- **ETHOS Compliant:** Built on principles from ground up
|
|
- **Sustainable:** Proper architecture prevents technical debt accumulation
|
|
|
|
### For Legacy Users:
|
|
- **Clear Upgrade Path:** When ready, migration is documented and tested
|
|
- **Zero Pressure:** Can stay on legacy as long as needed (12 month LTS)
|
|
- **Feature Benefits:** All new features available in unified v1.0
|
|
|
|
### For New Users:
|
|
- **Production Ready:** Can deploy with confidence
|
|
- **Stable Foundation:** No critical blockers
|
|
- **Honest Status:** Clear documentation of capabilities and limits
|
|
|
|
---
|
|
|
|
## RISK MITIGATION
|
|
|
|
### Primary Risk: Timeline Slippage
|
|
**Mitigation:**
|
|
- Daily progress tracking via TodoWrite
|
|
- Weekly mini-reviews to catch issues early
|
|
- Flexibility to extend individual weeks (but not total scope)
|
|
|
|
### Secondary Risk: Discovering More Issues
|
|
**Mitigation:**
|
|
- Lilith's periodic review during Weeks 5-6
|
|
- Focus on P0s only (no feature creep)
|
|
- Accept that future releases will improve further
|
|
|
|
### Tertiary Risk: User Impatience
|
|
**Mitigation:**
|
|
- Clear communication about timeline
|
|
- Honest status updates
|
|
- Document progress publicly (Codeberg issues)
|
|
|
|
---
|
|
|
|
**Roadmap Created:** January 22, 2026
|
|
**Target Release:** Week 8 (mid-March 2026)
|
|
**Tracking:** TodoWrite integration for daily progress
|
|
**Status:** READY TO BEGIN
|