299 lines
10 KiB
Markdown
299 lines
10 KiB
Markdown
# Future Enhancements & Considerations
|
|
|
|
## Critical Testing Issues
|
|
|
|
### Windows Agent Update Persistence Bug
|
|
**Status:** Needs Investigation
|
|
|
|
**Problem:** Microsoft Security Defender updates reappearing after installation
|
|
- Updates marked as installed but show back up in scan results
|
|
- Possible Windows Update state caching issue
|
|
- May be related to Windows Update Agent refresh timing
|
|
|
|
**Investigation Needed:**
|
|
- Verify update installation actually completes on Windows side
|
|
- Check Windows Update API state after installation
|
|
- Compare package state in database vs Windows registry
|
|
- Test with different update types (Defender vs other updates)
|
|
- May need to force WUA refresh after installation
|
|
|
|
**Priority:** High - affects Windows agent reliability
|
|
|
|
---
|
|
|
|
## Immediate Priority - Real-Time Operations
|
|
|
|
### Intelligent Heartbeat System Enhancement
|
|
|
|
**Current State:**
|
|
- Manual heartbeat toggle (pink icon when active)
|
|
- User-initiated only
|
|
- Fixed duration options
|
|
|
|
**Proposed Enhancement:**
|
|
- **Auto-trigger heartbeat on operations:** Any command sent to agent triggers heartbeat automatically
|
|
- **Color coding:**
|
|
- Blue: System-initiated heartbeat (scan, install, etc)
|
|
- Pink: User-initiated manual heartbeat
|
|
- **Lifecycle management:** Heartbeat auto-ends when operation completes
|
|
- **Smart detection:** Don't spam heartbeat commands if already active
|
|
|
|
**Implementation Strategy:**
|
|
Phase 1: Scan operations auto-trigger heartbeat
|
|
Phase 2: Install/approve operations auto-trigger heartbeat
|
|
Phase 3: Any agent command auto-triggers appropriate heartbeat duration
|
|
Phase 4: Heartbeat duration scales with operation type (30s scan vs 10m install)
|
|
|
|
**User Experience:**
|
|
- User clicks "Scan Now" → blue heartbeat activates → scan completes → heartbeat stops
|
|
- User clicks "Install" → blue heartbeat activates → install completes → heartbeat stops
|
|
- User manually triggers heartbeat → pink icon → user controls duration
|
|
|
|
**Priority:** High - improves responsiveness without manual intervention
|
|
|
|
**Dashboard Visualization Enhancement:**
|
|
- **Live Commands Dashboard Widget:** Aggregate view of all active operations
|
|
- **Color coding extends to commands:**
|
|
- Pink badges: User-initiated commands (manual scan, manual install, etc)
|
|
- Blue badges: System-orchestrated commands (auto-scan, auto-heartbeat, approved workflows)
|
|
- **Fleet monitoring at a glance:**
|
|
- Visual breakdown: "X agents with blue (system) operations | Y agents with pink (manual) operations"
|
|
- Quick filtering: "Show only system-orchestrated operations" vs "Show only user-initiated"
|
|
- Live count: "Active system operations triggering heartbeats: 3"
|
|
- **Agent list integration:**
|
|
- Small blue/pink indicator dots next to agent names
|
|
- Sort/filter by active heartbeat status and source
|
|
- Dashboard stats showing heartbeat distribution across fleet
|
|
|
|
**Use Case:** MSP/homelab fleet monitoring - differentiate between automated orchestration (blue) and manual intervention (pink) at a glance. Helps identify which systems need attention vs which are running autonomously.
|
|
|
|
**Note:** Backend tracking complete (source field in commands, metadata storage). Frontend visualization deferred for post-V1.0.
|
|
|
|
---
|
|
|
|
## Strategic Architecture Decisions
|
|
|
|
### Update Management Philosophy - Pre-V1.0 Discussion Needed
|
|
|
|
**Core Questions:**
|
|
1. **Are we a mirror?** Do we cache/store update packages locally?
|
|
2. **Are we a gatekeeper?** Do we proxy updates through our server?
|
|
3. **Are we an orchestrator?** Do we just coordinate direct agent→repo downloads?
|
|
|
|
**Current Implementation:** Orchestrator model
|
|
- Agents download directly from upstream repos
|
|
- Server coordinates approval/installation
|
|
- No package caching or storage
|
|
|
|
**Alternative Models to Consider:**
|
|
|
|
**Model A: Package Proxy/Cache**
|
|
- Server downloads and caches approved updates
|
|
- Agents pull from local server instead of internet
|
|
- Pros: Bandwidth savings, offline capability, version pinning
|
|
- Cons: Storage requirements, security responsibility, repo sync complexity
|
|
|
|
**Model B: Approval Database**
|
|
- Server stores approval decisions without packages
|
|
- Agents check "is package X approved?" before installing from upstream
|
|
- Pros: Lightweight, flexible, audit trail
|
|
- Cons: No offline capability, no bandwidth savings
|
|
|
|
**Model C: Hybrid Approach**
|
|
- Critical updates: Cache locally (security patches)
|
|
- Regular updates: Direct from upstream
|
|
- User-configurable per update category
|
|
|
|
**Windows Enforcement Challenge:**
|
|
- Linux: Can control APT/DNF sources easily
|
|
- Windows: Windows Update has limited local control
|
|
- Winget: Can control sources
|
|
- Need unified approach that works cross-platform
|
|
|
|
**Questions for V1.0:**
|
|
- Do users want local update caching?
|
|
- Is bandwidth savings worth storage complexity?
|
|
- Should "disapprove" mean "block installation" or just "don't auto-install"?
|
|
- How do we handle Windows Update's limited control surface?
|
|
|
|
**Decision Timeline:** Before V1.0 - this affects database schema, agent architecture, storage requirements
|
|
|
|
---
|
|
|
|
## High Priority - Security & Authentication
|
|
|
|
### Cryptographically Signed Agent Binaries
|
|
|
|
**Problem:** Currently agents can be copied between servers, duplicated, or spoofed. Rate limiting is IP-based which doesn't prevent abuse at the agent level.
|
|
|
|
**Proposed Solution:**
|
|
- Server generates unique cryptographic signature when building/distributing agent binaries
|
|
- Each agent binary is bound to the specific server instance via:
|
|
- SSH keys or x.509 certificates
|
|
- Server's public/private key pair
|
|
- Unique server identifier embedded in binary at build time
|
|
- Agent presents cryptographic proof of authenticity during registration and check-ins
|
|
- Server validates signature before accepting any agent communication
|
|
|
|
**Benefits:**
|
|
1. **Better Rate Limiting:** Track and limit per-agent-binary instead of per-IP
|
|
- Prevents multiple agents from same host sharing rate limit bucket
|
|
- Each unique agent has its own quota
|
|
- Detect and block duplicated/copied agents
|
|
|
|
2. **Prevents Cross-Server Agent Migration:**
|
|
- Agent built for Server A cannot register with Server B
|
|
- Stops unauthorized agent redistribution
|
|
- Ensures agents only communicate with their originating server
|
|
|
|
3. **Audit Trail:**
|
|
- Track which specific binary version is running where
|
|
- Identify compromised or rogue agent binaries
|
|
- Revoke specific agent signatures if needed
|
|
|
|
**Implementation Considerations:**
|
|
- Use Ed25519 or RSA for signing (fast, secure)
|
|
- Embed server public key in agent binary at build time
|
|
- Store server private key securely (not in env file)
|
|
- Agent includes signature in Authorization header alongside token
|
|
- Server validates: signature + token + agent_id combo
|
|
- Migration path for existing unsigned agents
|
|
|
|
**Timeline:** Sooner than initially thought - foundational security improvement
|
|
|
|
---
|
|
|
|
## Medium Priority - UI/UX Improvements
|
|
|
|
### Rate Limit Settings UI
|
|
**Current State:** API endpoints exist, UI skeleton present but non-functional
|
|
|
|
**Needed:**
|
|
- Display current rate limit values for all endpoint types
|
|
- Live editing of limits with validation
|
|
- Show current usage/remaining per limit type
|
|
- Reset to defaults button
|
|
- Preview impact before applying changes
|
|
- Warning when setting limits too low
|
|
|
|
**Location:** Settings page → Rate Limits section
|
|
|
|
### Server Status/Splash During Operations
|
|
**Current State:** Dashboard shows "Failed to load" during server restarts/maintenance
|
|
|
|
**Needed:**
|
|
- Detect when server is unreachable vs actual error
|
|
- Show friendly "Server restarting..." splash instead of error
|
|
- Maybe animated spinner or progress indicator
|
|
- Different states:
|
|
- Server starting up
|
|
- Server restarting (config change)
|
|
- Server maintenance
|
|
- Actual error (needs user action)
|
|
|
|
**Possible Implementation:**
|
|
- SetupCompletionChecker could handle this (already polling /health)
|
|
- Add status overlay component
|
|
- Detect specific error types (network vs 500 vs 401)
|
|
|
|
### Dashboard Statistics Loading State
|
|
**Current:** Hard error when stats unavailable
|
|
|
|
**Better:**
|
|
- Skeleton loaders for stat cards
|
|
- Graceful degradation if some stats fail
|
|
- Retry button for failed stat fetches
|
|
- Cache last-known-good values briefly
|
|
|
|
---
|
|
|
|
## Lower Priority - Feature Enhancements
|
|
|
|
### Agent Auto-Update System
|
|
Currently agents must be manually updated. Need:
|
|
- Server-initiated agent updates
|
|
- Rollback capability
|
|
- Staged rollouts (canary deployments)
|
|
- Version compatibility checks
|
|
|
|
### Proxmox Integration
|
|
Planned feature for managing VMs/containers:
|
|
- Detect Proxmox hosts
|
|
- List VMs and containers
|
|
- Trigger updates at VM/container level
|
|
- Separate update categories for host vs guests
|
|
|
|
### Mobile-Responsive Dashboard
|
|
Works but not optimized:
|
|
- Better mobile nav (hamburger menu)
|
|
- Touch-friendly buttons
|
|
- Responsive tables (card view on mobile)
|
|
- PWA support for installing as app
|
|
|
|
### Notification System
|
|
- Email alerts for failed updates
|
|
- Webhook integration (Discord, Slack, etc)
|
|
- Configurable notification rules
|
|
- Quiet hours / alert throttling
|
|
|
|
### Scheduled Update Windows
|
|
- Define maintenance windows per agent
|
|
- Auto-approve updates during windows
|
|
- Block updates outside windows
|
|
- Timezone-aware scheduling
|
|
|
|
---
|
|
|
|
## Technical Debt
|
|
|
|
### Configuration Management
|
|
**Current:** Settings scattered between database, .env file, and hardcoded defaults
|
|
|
|
**Better:**
|
|
- Unified settings table in database
|
|
- Web UI for all configuration
|
|
- Import/export settings
|
|
- Settings version history
|
|
|
|
### Testing Coverage
|
|
- Add integration tests for rate limiter
|
|
- Test agent registration flow end-to-end
|
|
- UI component tests for critical paths
|
|
- Load testing for concurrent agents
|
|
|
|
### Documentation
|
|
- API reference needs expansion
|
|
- Agent installation guide for edge cases
|
|
- Troubleshooting guide
|
|
- Architecture diagrams
|
|
|
|
### Code Organization
|
|
- Rate limiter settings should be database-backed (currently in-memory only)
|
|
- Agent timeout values hardcoded (need to be configurable)
|
|
- Shutdown delay hardcoded at 1 minute (user-adjustable needed)
|
|
|
|
---
|
|
|
|
## Notes & Philosophy
|
|
|
|
- **Less is more:** No enterprise BS, keep it simple
|
|
- **FOSS mentality:** All software has bugs, best effort approach
|
|
- **Homelab-first:** Build for real use cases, not investor pitches
|
|
- **Honest about limitations:** Document what doesn't work
|
|
- **Community-driven:** Users know their needs best
|
|
|
|
---
|
|
|
|
## Implementation Priority Order
|
|
|
|
1. **Cryptographic agent signing** - Security foundation, enables better rate limiting
|
|
2. **Rate limit UI completion** - Already have API, just need frontend
|
|
3. **Server status splash** - UX improvement, quick win
|
|
4. **Settings management refactor** - Enables other features
|
|
5. **Auto-update system** - Major feature, needs careful design
|
|
6. **Everything else** - As time permits
|
|
|
|
---
|
|
|
|
Last updated: 2025-10-31
|