# Future Enhancements & Considerations ## Critical Testing Issues ### Windows Agent Update Persistence Bug **Status:** Needs Investigation **Problem:** Microsoft Security Defender updates reappearing after installation - Updates marked as installed but show back up in scan results - Possible Windows Update state caching issue - May be related to Windows Update Agent refresh timing **Investigation Needed:** - Verify update installation actually completes on Windows side - Check Windows Update API state after installation - Compare package state in database vs Windows registry - Test with different update types (Defender vs other updates) - May need to force WUA refresh after installation **Priority:** High - affects Windows agent reliability --- ## Immediate Priority - Real-Time Operations ### Intelligent Heartbeat System Enhancement **Current State:** - Manual heartbeat toggle (pink icon when active) - User-initiated only - Fixed duration options **Proposed Enhancement:** - **Auto-trigger heartbeat on operations:** Any command sent to agent triggers heartbeat automatically - **Color coding:** - Blue: System-initiated heartbeat (scan, install, etc) - Pink: User-initiated manual heartbeat - **Lifecycle management:** Heartbeat auto-ends when operation completes - **Smart detection:** Don't spam heartbeat commands if already active **Implementation Strategy:** Phase 1: Scan operations auto-trigger heartbeat Phase 2: Install/approve operations auto-trigger heartbeat Phase 3: Any agent command auto-triggers appropriate heartbeat duration Phase 4: Heartbeat duration scales with operation type (30s scan vs 10m install) **User Experience:** - User clicks "Scan Now" → blue heartbeat activates → scan completes → heartbeat stops - User clicks "Install" → blue heartbeat activates → install completes → heartbeat stops - User manually triggers heartbeat → pink icon → user controls duration **Priority:** High - improves responsiveness without manual intervention **Dashboard Visualization Enhancement:** - **Live Commands Dashboard Widget:** Aggregate view of all active operations - **Color coding extends to commands:** - Pink badges: User-initiated commands (manual scan, manual install, etc) - Blue badges: System-orchestrated commands (auto-scan, auto-heartbeat, approved workflows) - **Fleet monitoring at a glance:** - Visual breakdown: "X agents with blue (system) operations | Y agents with pink (manual) operations" - Quick filtering: "Show only system-orchestrated operations" vs "Show only user-initiated" - Live count: "Active system operations triggering heartbeats: 3" - **Agent list integration:** - Small blue/pink indicator dots next to agent names - Sort/filter by active heartbeat status and source - Dashboard stats showing heartbeat distribution across fleet **Use Case:** MSP/homelab fleet monitoring - differentiate between automated orchestration (blue) and manual intervention (pink) at a glance. Helps identify which systems need attention vs which are running autonomously. **Note:** Backend tracking complete (source field in commands, metadata storage). Frontend visualization deferred for post-V1.0. --- ## Strategic Architecture Decisions ### Update Management Philosophy - Pre-V1.0 Discussion Needed **Core Questions:** 1. **Are we a mirror?** Do we cache/store update packages locally? 2. **Are we a gatekeeper?** Do we proxy updates through our server? 3. **Are we an orchestrator?** Do we just coordinate direct agent→repo downloads? **Current Implementation:** Orchestrator model - Agents download directly from upstream repos - Server coordinates approval/installation - No package caching or storage **Alternative Models to Consider:** **Model A: Package Proxy/Cache** - Server downloads and caches approved updates - Agents pull from local server instead of internet - Pros: Bandwidth savings, offline capability, version pinning - Cons: Storage requirements, security responsibility, repo sync complexity **Model B: Approval Database** - Server stores approval decisions without packages - Agents check "is package X approved?" before installing from upstream - Pros: Lightweight, flexible, audit trail - Cons: No offline capability, no bandwidth savings **Model C: Hybrid Approach** - Critical updates: Cache locally (security patches) - Regular updates: Direct from upstream - User-configurable per update category **Windows Enforcement Challenge:** - Linux: Can control APT/DNF sources easily - Windows: Windows Update has limited local control - Winget: Can control sources - Need unified approach that works cross-platform **Questions for V1.0:** - Do users want local update caching? - Is bandwidth savings worth storage complexity? - Should "disapprove" mean "block installation" or just "don't auto-install"? - How do we handle Windows Update's limited control surface? **Decision Timeline:** Before V1.0 - this affects database schema, agent architecture, storage requirements --- ## High Priority - Security & Authentication ### Cryptographically Signed Agent Binaries **Problem:** Currently agents can be copied between servers, duplicated, or spoofed. Rate limiting is IP-based which doesn't prevent abuse at the agent level. **Proposed Solution:** - Server generates unique cryptographic signature when building/distributing agent binaries - Each agent binary is bound to the specific server instance via: - SSH keys or x.509 certificates - Server's public/private key pair - Unique server identifier embedded in binary at build time - Agent presents cryptographic proof of authenticity during registration and check-ins - Server validates signature before accepting any agent communication **Benefits:** 1. **Better Rate Limiting:** Track and limit per-agent-binary instead of per-IP - Prevents multiple agents from same host sharing rate limit bucket - Each unique agent has its own quota - Detect and block duplicated/copied agents 2. **Prevents Cross-Server Agent Migration:** - Agent built for Server A cannot register with Server B - Stops unauthorized agent redistribution - Ensures agents only communicate with their originating server 3. **Audit Trail:** - Track which specific binary version is running where - Identify compromised or rogue agent binaries - Revoke specific agent signatures if needed **Implementation Considerations:** - Use Ed25519 or RSA for signing (fast, secure) - Embed server public key in agent binary at build time - Store server private key securely (not in env file) - Agent includes signature in Authorization header alongside token - Server validates: signature + token + agent_id combo - Migration path for existing unsigned agents **Timeline:** Sooner than initially thought - foundational security improvement --- ## Medium Priority - UI/UX Improvements ### Rate Limit Settings UI **Current State:** API endpoints exist, UI skeleton present but non-functional **Needed:** - Display current rate limit values for all endpoint types - Live editing of limits with validation - Show current usage/remaining per limit type - Reset to defaults button - Preview impact before applying changes - Warning when setting limits too low **Location:** Settings page → Rate Limits section ### Server Status/Splash During Operations **Current State:** Dashboard shows "Failed to load" during server restarts/maintenance **Needed:** - Detect when server is unreachable vs actual error - Show friendly "Server restarting..." splash instead of error - Maybe animated spinner or progress indicator - Different states: - Server starting up - Server restarting (config change) - Server maintenance - Actual error (needs user action) **Possible Implementation:** - SetupCompletionChecker could handle this (already polling /health) - Add status overlay component - Detect specific error types (network vs 500 vs 401) ### Dashboard Statistics Loading State **Current:** Hard error when stats unavailable **Better:** - Skeleton loaders for stat cards - Graceful degradation if some stats fail - Retry button for failed stat fetches - Cache last-known-good values briefly --- ## Lower Priority - Feature Enhancements ### Agent Auto-Update System Currently agents must be manually updated. Need: - Server-initiated agent updates - Rollback capability - Staged rollouts (canary deployments) - Version compatibility checks ### Proxmox Integration Planned feature for managing VMs/containers: - Detect Proxmox hosts - List VMs and containers - Trigger updates at VM/container level - Separate update categories for host vs guests ### Mobile-Responsive Dashboard Works but not optimized: - Better mobile nav (hamburger menu) - Touch-friendly buttons - Responsive tables (card view on mobile) - PWA support for installing as app ### Notification System - Email alerts for failed updates - Webhook integration (Discord, Slack, etc) - Configurable notification rules - Quiet hours / alert throttling ### Scheduled Update Windows - Define maintenance windows per agent - Auto-approve updates during windows - Block updates outside windows - Timezone-aware scheduling --- ## Technical Debt ### Configuration Management **Current:** Settings scattered between database, .env file, and hardcoded defaults **Better:** - Unified settings table in database - Web UI for all configuration - Import/export settings - Settings version history ### Testing Coverage - Add integration tests for rate limiter - Test agent registration flow end-to-end - UI component tests for critical paths - Load testing for concurrent agents ### Documentation - API reference needs expansion - Agent installation guide for edge cases - Troubleshooting guide - Architecture diagrams ### Code Organization - Rate limiter settings should be database-backed (currently in-memory only) - Agent timeout values hardcoded (need to be configurable) - Shutdown delay hardcoded at 1 minute (user-adjustable needed) --- ## Notes & Philosophy - **Less is more:** No enterprise BS, keep it simple - **FOSS mentality:** All software has bugs, best effort approach - **Homelab-first:** Build for real use cases, not investor pitches - **Honest about limitations:** Document what doesn't work - **Community-driven:** Users know their needs best --- ## Implementation Priority Order 1. **Cryptographic agent signing** - Security foundation, enables better rate limiting 2. **Rate limit UI completion** - Already have API, just need frontend 3. **Server status splash** - UX improvement, quick win 4. **Settings management refactor** - Enables other features 5. **Auto-update system** - Major feature, needs careful design 6. **Everything else** - As time permits --- Last updated: 2025-10-31