Fimeg/Redflag

Fork 0

Files

Fimeg 484a7f77ce Add docs and project files - force for Culurien

2026-03-28 20:46:24 -04:00

10 KiB

Raw Permalink Blame History

Future Enhancements & Considerations

Critical Testing Issues

Windows Agent Update Persistence Bug

Status: Needs Investigation

Problem: Microsoft Security Defender updates reappearing after installation

Updates marked as installed but show back up in scan results
Possible Windows Update state caching issue
May be related to Windows Update Agent refresh timing

Investigation Needed:

Verify update installation actually completes on Windows side
Check Windows Update API state after installation
Compare package state in database vs Windows registry
Test with different update types (Defender vs other updates)
May need to force WUA refresh after installation

Priority: High - affects Windows agent reliability

Immediate Priority - Real-Time Operations

Intelligent Heartbeat System Enhancement

Current State:

Manual heartbeat toggle (pink icon when active)
User-initiated only
Fixed duration options

Proposed Enhancement:

Auto-trigger heartbeat on operations: Any command sent to agent triggers heartbeat automatically
Color coding:
- Blue: System-initiated heartbeat (scan, install, etc)
- Pink: User-initiated manual heartbeat
Lifecycle management: Heartbeat auto-ends when operation completes
Smart detection: Don't spam heartbeat commands if already active

Implementation Strategy: Phase 1: Scan operations auto-trigger heartbeat Phase 2: Install/approve operations auto-trigger heartbeat Phase 3: Any agent command auto-triggers appropriate heartbeat duration Phase 4: Heartbeat duration scales with operation type (30s scan vs 10m install)

User Experience:

User clicks "Scan Now" → blue heartbeat activates → scan completes → heartbeat stops
User clicks "Install" → blue heartbeat activates → install completes → heartbeat stops
User manually triggers heartbeat → pink icon → user controls duration

Priority: High - improves responsiveness without manual intervention

Dashboard Visualization Enhancement:

Live Commands Dashboard Widget: Aggregate view of all active operations
Color coding extends to commands:
- Pink badges: User-initiated commands (manual scan, manual install, etc)
- Blue badges: System-orchestrated commands (auto-scan, auto-heartbeat, approved workflows)
Fleet monitoring at a glance:
- Visual breakdown: "X agents with blue (system) operations | Y agents with pink (manual) operations"
- Quick filtering: "Show only system-orchestrated operations" vs "Show only user-initiated"
- Live count: "Active system operations triggering heartbeats: 3"
Agent list integration:
- Small blue/pink indicator dots next to agent names
- Sort/filter by active heartbeat status and source
- Dashboard stats showing heartbeat distribution across fleet

Use Case: MSP/homelab fleet monitoring - differentiate between automated orchestration (blue) and manual intervention (pink) at a glance. Helps identify which systems need attention vs which are running autonomously.

Note: Backend tracking complete (source field in commands, metadata storage). Frontend visualization deferred for post-V1.0.

Strategic Architecture Decisions

Update Management Philosophy - Pre-V1.0 Discussion Needed

Core Questions:

Are we a mirror? Do we cache/store update packages locally?
Are we a gatekeeper? Do we proxy updates through our server?
Are we an orchestrator? Do we just coordinate direct agent→repo downloads?

Current Implementation: Orchestrator model

Agents download directly from upstream repos
Server coordinates approval/installation
No package caching or storage

Alternative Models to Consider:

Model A: Package Proxy/Cache

Server downloads and caches approved updates
Agents pull from local server instead of internet
Pros: Bandwidth savings, offline capability, version pinning
Cons: Storage requirements, security responsibility, repo sync complexity

Model B: Approval Database

Server stores approval decisions without packages
Agents check "is package X approved?" before installing from upstream
Pros: Lightweight, flexible, audit trail
Cons: No offline capability, no bandwidth savings

Model C: Hybrid Approach

Critical updates: Cache locally (security patches)
Regular updates: Direct from upstream
User-configurable per update category

Windows Enforcement Challenge:

Linux: Can control APT/DNF sources easily
Windows: Windows Update has limited local control
Winget: Can control sources
Need unified approach that works cross-platform

Questions for V1.0:

Do users want local update caching?
Is bandwidth savings worth storage complexity?
Should "disapprove" mean "block installation" or just "don't auto-install"?
How do we handle Windows Update's limited control surface?

Decision Timeline: Before V1.0 - this affects database schema, agent architecture, storage requirements

High Priority - Security & Authentication

Cryptographically Signed Agent Binaries

Problem: Currently agents can be copied between servers, duplicated, or spoofed. Rate limiting is IP-based which doesn't prevent abuse at the agent level.

Proposed Solution:

Server generates unique cryptographic signature when building/distributing agent binaries
Each agent binary is bound to the specific server instance via:
- SSH keys or x.509 certificates
- Server's public/private key pair
- Unique server identifier embedded in binary at build time
Agent presents cryptographic proof of authenticity during registration and check-ins
Server validates signature before accepting any agent communication

Benefits:

Better Rate Limiting: Track and limit per-agent-binary instead of per-IP
- Prevents multiple agents from same host sharing rate limit bucket
- Each unique agent has its own quota
- Detect and block duplicated/copied agents
Prevents Cross-Server Agent Migration:
- Agent built for Server A cannot register with Server B
- Stops unauthorized agent redistribution
- Ensures agents only communicate with their originating server
Audit Trail:
- Track which specific binary version is running where
- Identify compromised or rogue agent binaries
- Revoke specific agent signatures if needed

Implementation Considerations:

Use Ed25519 or RSA for signing (fast, secure)
Embed server public key in agent binary at build time
Store server private key securely (not in env file)
Agent includes signature in Authorization header alongside token
Server validates: signature + token + agent_id combo
Migration path for existing unsigned agents

Timeline: Sooner than initially thought - foundational security improvement

Medium Priority - UI/UX Improvements

Rate Limit Settings UI

Current State: API endpoints exist, UI skeleton present but non-functional

Needed:

Display current rate limit values for all endpoint types
Live editing of limits with validation
Show current usage/remaining per limit type
Reset to defaults button
Preview impact before applying changes
Warning when setting limits too low

Location: Settings page → Rate Limits section

Server Status/Splash During Operations

Current State: Dashboard shows "Failed to load" during server restarts/maintenance

Needed:

Detect when server is unreachable vs actual error
Show friendly "Server restarting..." splash instead of error
Maybe animated spinner or progress indicator
Different states:
- Server starting up
- Server restarting (config change)
- Server maintenance
- Actual error (needs user action)

Possible Implementation:

SetupCompletionChecker could handle this (already polling /health)
Add status overlay component
Detect specific error types (network vs 500 vs 401)

Dashboard Statistics Loading State

Current: Hard error when stats unavailable

Better:

Skeleton loaders for stat cards
Graceful degradation if some stats fail
Retry button for failed stat fetches
Cache last-known-good values briefly

Lower Priority - Feature Enhancements

Agent Auto-Update System

Currently agents must be manually updated. Need:

Server-initiated agent updates
Rollback capability
Staged rollouts (canary deployments)
Version compatibility checks

Proxmox Integration

Planned feature for managing VMs/containers:

Detect Proxmox hosts
List VMs and containers
Trigger updates at VM/container level
Separate update categories for host vs guests

Mobile-Responsive Dashboard

Works but not optimized:

Better mobile nav (hamburger menu)
Touch-friendly buttons
Responsive tables (card view on mobile)
PWA support for installing as app

Notification System

Email alerts for failed updates
Webhook integration (Discord, Slack, etc)
Configurable notification rules
Quiet hours / alert throttling

Scheduled Update Windows

Define maintenance windows per agent
Auto-approve updates during windows
Block updates outside windows
Timezone-aware scheduling

Technical Debt

Configuration Management

Current: Settings scattered between database, .env file, and hardcoded defaults