Files

299 lines
10 KiB
Markdown

# Future Enhancements & Considerations
## Critical Testing Issues
### Windows Agent Update Persistence Bug
**Status:** Needs Investigation
**Problem:** Microsoft Security Defender updates reappearing after installation
- Updates marked as installed but show back up in scan results
- Possible Windows Update state caching issue
- May be related to Windows Update Agent refresh timing
**Investigation Needed:**
- Verify update installation actually completes on Windows side
- Check Windows Update API state after installation
- Compare package state in database vs Windows registry
- Test with different update types (Defender vs other updates)
- May need to force WUA refresh after installation
**Priority:** High - affects Windows agent reliability
---
## Immediate Priority - Real-Time Operations
### Intelligent Heartbeat System Enhancement
**Current State:**
- Manual heartbeat toggle (pink icon when active)
- User-initiated only
- Fixed duration options
**Proposed Enhancement:**
- **Auto-trigger heartbeat on operations:** Any command sent to agent triggers heartbeat automatically
- **Color coding:**
- Blue: System-initiated heartbeat (scan, install, etc)
- Pink: User-initiated manual heartbeat
- **Lifecycle management:** Heartbeat auto-ends when operation completes
- **Smart detection:** Don't spam heartbeat commands if already active
**Implementation Strategy:**
Phase 1: Scan operations auto-trigger heartbeat
Phase 2: Install/approve operations auto-trigger heartbeat
Phase 3: Any agent command auto-triggers appropriate heartbeat duration
Phase 4: Heartbeat duration scales with operation type (30s scan vs 10m install)
**User Experience:**
- User clicks "Scan Now" → blue heartbeat activates → scan completes → heartbeat stops
- User clicks "Install" → blue heartbeat activates → install completes → heartbeat stops
- User manually triggers heartbeat → pink icon → user controls duration
**Priority:** High - improves responsiveness without manual intervention
**Dashboard Visualization Enhancement:**
- **Live Commands Dashboard Widget:** Aggregate view of all active operations
- **Color coding extends to commands:**
- Pink badges: User-initiated commands (manual scan, manual install, etc)
- Blue badges: System-orchestrated commands (auto-scan, auto-heartbeat, approved workflows)
- **Fleet monitoring at a glance:**
- Visual breakdown: "X agents with blue (system) operations | Y agents with pink (manual) operations"
- Quick filtering: "Show only system-orchestrated operations" vs "Show only user-initiated"
- Live count: "Active system operations triggering heartbeats: 3"
- **Agent list integration:**
- Small blue/pink indicator dots next to agent names
- Sort/filter by active heartbeat status and source
- Dashboard stats showing heartbeat distribution across fleet
**Use Case:** MSP/homelab fleet monitoring - differentiate between automated orchestration (blue) and manual intervention (pink) at a glance. Helps identify which systems need attention vs which are running autonomously.
**Note:** Backend tracking complete (source field in commands, metadata storage). Frontend visualization deferred for post-V1.0.
---
## Strategic Architecture Decisions
### Update Management Philosophy - Pre-V1.0 Discussion Needed
**Core Questions:**
1. **Are we a mirror?** Do we cache/store update packages locally?
2. **Are we a gatekeeper?** Do we proxy updates through our server?
3. **Are we an orchestrator?** Do we just coordinate direct agent→repo downloads?
**Current Implementation:** Orchestrator model
- Agents download directly from upstream repos
- Server coordinates approval/installation
- No package caching or storage
**Alternative Models to Consider:**
**Model A: Package Proxy/Cache**
- Server downloads and caches approved updates
- Agents pull from local server instead of internet
- Pros: Bandwidth savings, offline capability, version pinning
- Cons: Storage requirements, security responsibility, repo sync complexity
**Model B: Approval Database**
- Server stores approval decisions without packages
- Agents check "is package X approved?" before installing from upstream
- Pros: Lightweight, flexible, audit trail
- Cons: No offline capability, no bandwidth savings
**Model C: Hybrid Approach**
- Critical updates: Cache locally (security patches)
- Regular updates: Direct from upstream
- User-configurable per update category
**Windows Enforcement Challenge:**
- Linux: Can control APT/DNF sources easily
- Windows: Windows Update has limited local control
- Winget: Can control sources
- Need unified approach that works cross-platform
**Questions for V1.0:**
- Do users want local update caching?
- Is bandwidth savings worth storage complexity?
- Should "disapprove" mean "block installation" or just "don't auto-install"?
- How do we handle Windows Update's limited control surface?
**Decision Timeline:** Before V1.0 - this affects database schema, agent architecture, storage requirements
---
## High Priority - Security & Authentication
### Cryptographically Signed Agent Binaries
**Problem:** Currently agents can be copied between servers, duplicated, or spoofed. Rate limiting is IP-based which doesn't prevent abuse at the agent level.
**Proposed Solution:**
- Server generates unique cryptographic signature when building/distributing agent binaries
- Each agent binary is bound to the specific server instance via:
- SSH keys or x.509 certificates
- Server's public/private key pair
- Unique server identifier embedded in binary at build time
- Agent presents cryptographic proof of authenticity during registration and check-ins
- Server validates signature before accepting any agent communication
**Benefits:**
1. **Better Rate Limiting:** Track and limit per-agent-binary instead of per-IP
- Prevents multiple agents from same host sharing rate limit bucket
- Each unique agent has its own quota
- Detect and block duplicated/copied agents
2. **Prevents Cross-Server Agent Migration:**
- Agent built for Server A cannot register with Server B
- Stops unauthorized agent redistribution
- Ensures agents only communicate with their originating server
3. **Audit Trail:**
- Track which specific binary version is running where
- Identify compromised or rogue agent binaries
- Revoke specific agent signatures if needed
**Implementation Considerations:**
- Use Ed25519 or RSA for signing (fast, secure)
- Embed server public key in agent binary at build time
- Store server private key securely (not in env file)
- Agent includes signature in Authorization header alongside token
- Server validates: signature + token + agent_id combo
- Migration path for existing unsigned agents
**Timeline:** Sooner than initially thought - foundational security improvement
---
## Medium Priority - UI/UX Improvements
### Rate Limit Settings UI
**Current State:** API endpoints exist, UI skeleton present but non-functional
**Needed:**
- Display current rate limit values for all endpoint types
- Live editing of limits with validation
- Show current usage/remaining per limit type
- Reset to defaults button
- Preview impact before applying changes
- Warning when setting limits too low
**Location:** Settings page → Rate Limits section
### Server Status/Splash During Operations
**Current State:** Dashboard shows "Failed to load" during server restarts/maintenance
**Needed:**
- Detect when server is unreachable vs actual error
- Show friendly "Server restarting..." splash instead of error
- Maybe animated spinner or progress indicator
- Different states:
- Server starting up
- Server restarting (config change)
- Server maintenance
- Actual error (needs user action)
**Possible Implementation:**
- SetupCompletionChecker could handle this (already polling /health)
- Add status overlay component
- Detect specific error types (network vs 500 vs 401)
### Dashboard Statistics Loading State
**Current:** Hard error when stats unavailable
**Better:**
- Skeleton loaders for stat cards
- Graceful degradation if some stats fail
- Retry button for failed stat fetches
- Cache last-known-good values briefly
---
## Lower Priority - Feature Enhancements
### Agent Auto-Update System
Currently agents must be manually updated. Need:
- Server-initiated agent updates
- Rollback capability
- Staged rollouts (canary deployments)
- Version compatibility checks
### Proxmox Integration
Planned feature for managing VMs/containers:
- Detect Proxmox hosts
- List VMs and containers
- Trigger updates at VM/container level
- Separate update categories for host vs guests
### Mobile-Responsive Dashboard
Works but not optimized:
- Better mobile nav (hamburger menu)
- Touch-friendly buttons
- Responsive tables (card view on mobile)
- PWA support for installing as app
### Notification System
- Email alerts for failed updates
- Webhook integration (Discord, Slack, etc)
- Configurable notification rules
- Quiet hours / alert throttling
### Scheduled Update Windows
- Define maintenance windows per agent
- Auto-approve updates during windows
- Block updates outside windows
- Timezone-aware scheduling
---
## Technical Debt
### Configuration Management
**Current:** Settings scattered between database, .env file, and hardcoded defaults
**Better:**
- Unified settings table in database
- Web UI for all configuration
- Import/export settings
- Settings version history
### Testing Coverage
- Add integration tests for rate limiter
- Test agent registration flow end-to-end
- UI component tests for critical paths
- Load testing for concurrent agents
### Documentation
- API reference needs expansion
- Agent installation guide for edge cases
- Troubleshooting guide
- Architecture diagrams
### Code Organization
- Rate limiter settings should be database-backed (currently in-memory only)
- Agent timeout values hardcoded (need to be configurable)
- Shutdown delay hardcoded at 1 minute (user-adjustable needed)
---
## Notes & Philosophy
- **Less is more:** No enterprise BS, keep it simple
- **FOSS mentality:** All software has bugs, best effort approach
- **Homelab-first:** Build for real use cases, not investor pitches
- **Honest about limitations:** Document what doesn't work
- **Community-driven:** Users know their needs best
---
## Implementation Priority Order
1. **Cryptographic agent signing** - Security foundation, enables better rate limiting
2. **Rate limit UI completion** - Already have API, just need frontend
3. **Server status splash** - UX improvement, quick win
4. **Settings management refactor** - Enables other features
5. **Auto-update system** - Major feature, needs careful design
6. **Everything else** - As time permits
---
Last updated: 2025-10-31