9.1 KiB
Agent Subsystem Scanning - Implementation Plan
Current State (Problems)
-
Monolithic Scanning: Everything runs in one
scan_updatescommand- Storage metrics
- Update scanning (APT/DNF/Winget/Windows Update/Docker)
- System info collection
- Process info
-
No Granular Control: Can't disable individual subsystems
-
Poor Logging: History shows "System Operation" instead of specific subsystem names
-
No Schedule Tracking: Subsystems claim 15m intervals but don't actually follow them
-
No stdout/stderr Reporting: Refresh commands don't report detailed output
Proposed Architecture
New Command Types
Current: scan_updates (does everything)
New:
- scan_updates # Just package updates
- scan_storage # Disk usage only
- scan_system # CPU, memory, processes, uptime
- scan_docker # Docker containers/images
- heartbeat # Rapid polling check-in
Agent Subsystem Config
type SubsystemConfig struct {
Enabled bool
Interval time.Duration // How often to auto-run
LastRun time.Time
AutoRun bool // Server-initiated vs agent-initiated
}
type AgentSubsystems struct {
Updates SubsystemConfig // scan_updates
Storage SubsystemConfig // scan_storage
SystemInfo SubsystemConfig // scan_system
Docker SubsystemConfig // scan_docker
Heartbeat SubsystemConfig // heartbeat
}
Server-Side Subsystem Tracking
Database Schema Addition:
CREATE TABLE agent_subsystems (
agent_id UUID REFERENCES agents(id),
subsystem VARCHAR(50), -- 'storage', 'updates', 'system', 'docker'
enabled BOOLEAN DEFAULT true,
interval_minutes INTEGER DEFAULT 15,
auto_run BOOLEAN DEFAULT false, -- Server-scheduled vs on-demand
last_run_at TIMESTAMP,
next_run_at TIMESTAMP,
PRIMARY KEY (agent_id, subsystem)
);
UI Toggle Structure (Agent Health Tab)
Agent Health Subsystems
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
□ Package Updates
Scans for available package updates
[scan_updates] [completed] [ON] [15m] [2 min ago] [Auto]
□ Disk Usage Reporter
Reports disk usage metrics to server
[storage] [completed] [ON] [15m] [10 min ago] [Auto]
□ System Metrics
CPU, memory, process count, uptime
[system] [completed] [ON] [30m] [5 min ago] [Manual]
□ Docker Monitoring
Container and image update tracking
[docker] [idle] [OFF] [-] [never] [-]
□ Heartbeat
Rapid status check-in (5s polling)
[heartbeat] [active] [ON] [Permanent] [2s ago] [Manual]
Implementation Steps
Phase 1: Backend - New Command Types
File: aggregator-agent/cmd/agent/main.go
// Add new command handlers
case "scan_storage":
handleScanStorage(apiClient, cfg, cmd.ID)
case "scan_system":
handleScanSystem(apiClient, cfg, cmd.ID)
case "scan_docker":
handleScanDocker(apiClient, cfg, dockerScanner, cmd.ID)
New Handlers:
func handleScanStorage(client *client.Client, cfg *config.Config, commandID string) error {
// Collect disk info only
systemInfo, err := system.GetSystemInfo(AgentVersion)
stdout := fmt.Sprintf("Disk scan completed\n")
stdout += fmt.Sprintf("Found %d mount points\n", len(systemInfo.DiskInfo))
for _, disk := range systemInfo.DiskInfo {
stdout += fmt.Sprintf("- %s: %.1f%% used (%s / %s)\n",
disk.Mountpoint, disk.UsedPercent,
formatBytes(disk.Used), formatBytes(disk.Total))
}
return client.ReportCommandResult(commandID, "completed", stdout, "", 0)
}
Phase 2: Server - Subsystem API
New Endpoints:
POST /api/v1/agents/:id/subsystems/:name/enable
POST /api/v1/agents/:id/subsystems/:name/disable
POST /api/v1/agents/:id/subsystems/:name/trigger
GET /api/v1/agents/:id/subsystems
PATCH /api/v1/agents/:id/subsystems/:name
Example Request:
PATCH /api/v1/agents/uuid/subsystems/storage
{
"enabled": true,
"interval_minutes": 15,
"auto_run": false
}
Phase 3: Database Migration
File: aggregator-server/internal/database/migrations/013_agent_subsystems.up.sql
CREATE TABLE IF NOT EXISTS agent_subsystems (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
agent_id UUID NOT NULL REFERENCES agents(id) ON DELETE CASCADE,
subsystem VARCHAR(50) NOT NULL,
enabled BOOLEAN DEFAULT true,
interval_minutes INTEGER DEFAULT 15,
auto_run BOOLEAN DEFAULT false,
last_run_at TIMESTAMP,
next_run_at TIMESTAMP,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
UNIQUE(agent_id, subsystem)
);
CREATE INDEX idx_agent_subsystems_agent ON agent_subsystems(agent_id);
CREATE INDEX idx_agent_subsystems_next_run ON agent_subsystems(next_run_at)
WHERE enabled = true AND auto_run = true;
-- Default subsystems for existing agents
INSERT INTO agent_subsystems (agent_id, subsystem, enabled, interval_minutes, auto_run)
SELECT id, 'updates', true, 15, false FROM agents
UNION ALL
SELECT id, 'storage', true, 15, false FROM agents
UNION ALL
SELECT id, 'system', true, 30, false FROM agents
UNION ALL
SELECT id, 'docker', false, 15, false FROM agents;
Phase 4: UI - Agent Health Tab
Component: AgentScanners.tsx (already exists, needs enhancement)
Features needed:
- Toggle switches for enable/disable
- Interval dropdowns (5m, 15m, 30m, 1h)
- Auto-run toggle
- Last run timestamp
- "Scan Now" button per subsystem
- Status badges (idle, pending, running, completed, failed)
Phase 5: Scheduler
Server-side cron job:
// Every minute, check for subsystems due to run
func (s *Scheduler) CheckSubsystems() {
subsystems := db.GetDueSubsystems(time.Now())
for _, sub := range subsystems {
cmd := &Command{
AgentID: sub.AgentID,
Type: fmt.Sprintf("scan_%s", sub.Subsystem),
Status: "pending",
}
db.CreateCommand(cmd)
// Update next_run_at
sub.NextRunAt = time.Now().Add(time.Duration(sub.IntervalMinutes) * time.Minute)
db.UpdateSubsystem(sub)
}
}
Timeline Display Fix
Problem: History shows "System Operation" instead of "Disk Usage Reporter"
Solution: Update command result reporting to include subsystem metadata
// When reporting command results
client.ReportCommandResult(commandID, "completed", stdout, stderr, exitCode, metadata{
"subsystem": "storage",
"subsystem_label": "Disk Usage Reporter",
"scan_type": "storage",
})
ChatTimeline update:
// In getNarrativeSummary()
if (entry.metadata?.subsystem_label) {
subject = entry.metadata.subsystem_label;
} else if (entry.action === 'scan_updates') {
subject = 'Package Updates';
} else if (entry.action === 'scan_storage') {
subject = 'Disk Usage';
}
Windows Considerations
All subsystems must work on:
- ✅ Linux (APT, DNF, Docker)
- ✅ Windows (Windows Update, Winget, Docker)
Windows-specific subsystems:
scan_windows_services- Service monitoringscan_windows_features- Optional Windows featuresscan_event_logs- Security/Application logs (future)
Migration Path
- Backward Compatibility: Keep
scan_updatesworking as-is - Gradual Rollout: New agents use subsystems, old agents continue working
- Migration Command: Server can trigger
migrate_to_subsystemscommand - UI Toggle: "Use Legacy Scanning" checkbox in advanced settings
Testing Checklist
- Storage scan returns proper stdout with mount points
- System scan reports CPU/memory/processes
- Docker scan works when Docker not installed (graceful failure)
- Subsystem toggles persist across agent restarts
- Auto-run schedules fire correctly
- Manual "Scan Now" button works
- History timeline shows correct subsystem labels
- Windows agent supports all subsystems
- Linux agent supports all subsystems
File Changes Required
Agent:
cmd/agent/main.go- Add new command handlersinternal/client/client.go- Add metadata support to ReportCommandResult- New:
internal/subsystems/storage.go - New:
internal/subsystems/system.go - New:
internal/subsystems/docker.go
Server:
internal/database/migrations/013_agent_subsystems.up.sql- New:
internal/models/subsystem.go - New:
internal/database/queries/subsystems.go - New:
internal/api/handlers/subsystems.go - New:
internal/scheduler/subsystems.go
Web:
src/components/AgentScanners.tsx- Major enhancementsrc/hooks/useSubsystems.ts- New API hookssrc/lib/api.ts- Subsystem API methodssrc/components/ChatTimeline.tsx- Subsystem label display
Priority
v0.2.0 Must-Have:
- Separate storage scanning command
- Proper stdout/stderr reporting
- History timeline labels fixed
v0.3.0 Nice-to-Have:
- Full subsystem toggle UI
- Auto-run scheduler
- Per-subsystem intervals
Future:
- Windows-specific subsystems
- Custom subsystem plugins
- Subsystem dependencies (e.g., Docker requires system scan)