Files
Redflag/docs/4_LOG/October_2025/Architecture-Documentation/SUBSYSTEM_SCANNING_PLAN.md

9.1 KiB

Agent Subsystem Scanning - Implementation Plan

Current State (Problems)

  1. Monolithic Scanning: Everything runs in one scan_updates command

    • Storage metrics
    • Update scanning (APT/DNF/Winget/Windows Update/Docker)
    • System info collection
    • Process info
  2. No Granular Control: Can't disable individual subsystems

  3. Poor Logging: History shows "System Operation" instead of specific subsystem names

  4. No Schedule Tracking: Subsystems claim 15m intervals but don't actually follow them

  5. No stdout/stderr Reporting: Refresh commands don't report detailed output


Proposed Architecture

New Command Types

Current:  scan_updates (does everything)

New:
- scan_updates          # Just package updates
- scan_storage          # Disk usage only
- scan_system           # CPU, memory, processes, uptime
- scan_docker           # Docker containers/images
- heartbeat             # Rapid polling check-in

Agent Subsystem Config

type SubsystemConfig struct {
    Enabled      bool
    Interval     time.Duration  // How often to auto-run
    LastRun      time.Time
    AutoRun      bool           // Server-initiated vs agent-initiated
}

type AgentSubsystems struct {
    Updates      SubsystemConfig  // scan_updates
    Storage      SubsystemConfig  // scan_storage
    SystemInfo   SubsystemConfig  // scan_system
    Docker       SubsystemConfig  // scan_docker
    Heartbeat    SubsystemConfig  // heartbeat
}

Server-Side Subsystem Tracking

Database Schema Addition:

CREATE TABLE agent_subsystems (
    agent_id UUID REFERENCES agents(id),
    subsystem VARCHAR(50),  -- 'storage', 'updates', 'system', 'docker'
    enabled BOOLEAN DEFAULT true,
    interval_minutes INTEGER DEFAULT 15,
    auto_run BOOLEAN DEFAULT false,  -- Server-scheduled vs on-demand
    last_run_at TIMESTAMP,
    next_run_at TIMESTAMP,
    PRIMARY KEY (agent_id, subsystem)
);

UI Toggle Structure (Agent Health Tab)

Agent Health Subsystems
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

□ Package Updates
  Scans for available package updates
  [scan_updates] [completed] [ON] [15m] [2 min ago] [Auto]

□ Disk Usage Reporter
  Reports disk usage metrics to server
  [storage] [completed] [ON] [15m] [10 min ago] [Auto]

□ System Metrics
  CPU, memory, process count, uptime
  [system] [completed] [ON] [30m] [5 min ago] [Manual]

□ Docker Monitoring
  Container and image update tracking
  [docker] [idle] [OFF] [-] [never] [-]

□ Heartbeat
  Rapid status check-in (5s polling)
  [heartbeat] [active] [ON] [Permanent] [2s ago] [Manual]

Implementation Steps

Phase 1: Backend - New Command Types

File: aggregator-agent/cmd/agent/main.go

// Add new command handlers
case "scan_storage":
    handleScanStorage(apiClient, cfg, cmd.ID)

case "scan_system":
    handleScanSystem(apiClient, cfg, cmd.ID)

case "scan_docker":
    handleScanDocker(apiClient, cfg, dockerScanner, cmd.ID)

New Handlers:

func handleScanStorage(client *client.Client, cfg *config.Config, commandID string) error {
    // Collect disk info only
    systemInfo, err := system.GetSystemInfo(AgentVersion)

    stdout := fmt.Sprintf("Disk scan completed\n")
    stdout += fmt.Sprintf("Found %d mount points\n", len(systemInfo.DiskInfo))
    for _, disk := range systemInfo.DiskInfo {
        stdout += fmt.Sprintf("- %s: %.1f%% used (%s / %s)\n",
            disk.Mountpoint, disk.UsedPercent,
            formatBytes(disk.Used), formatBytes(disk.Total))
    }

    return client.ReportCommandResult(commandID, "completed", stdout, "", 0)
}

Phase 2: Server - Subsystem API

New Endpoints:

POST   /api/v1/agents/:id/subsystems/:name/enable
POST   /api/v1/agents/:id/subsystems/:name/disable
POST   /api/v1/agents/:id/subsystems/:name/trigger
GET    /api/v1/agents/:id/subsystems
PATCH  /api/v1/agents/:id/subsystems/:name

Example Request:

PATCH /api/v1/agents/uuid/subsystems/storage
{
  "enabled": true,
  "interval_minutes": 15,
  "auto_run": false
}

Phase 3: Database Migration

File: aggregator-server/internal/database/migrations/013_agent_subsystems.up.sql

CREATE TABLE IF NOT EXISTS agent_subsystems (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    agent_id UUID NOT NULL REFERENCES agents(id) ON DELETE CASCADE,
    subsystem VARCHAR(50) NOT NULL,
    enabled BOOLEAN DEFAULT true,
    interval_minutes INTEGER DEFAULT 15,
    auto_run BOOLEAN DEFAULT false,
    last_run_at TIMESTAMP,
    next_run_at TIMESTAMP,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    UNIQUE(agent_id, subsystem)
);

CREATE INDEX idx_agent_subsystems_agent ON agent_subsystems(agent_id);
CREATE INDEX idx_agent_subsystems_next_run ON agent_subsystems(next_run_at)
    WHERE enabled = true AND auto_run = true;

-- Default subsystems for existing agents
INSERT INTO agent_subsystems (agent_id, subsystem, enabled, interval_minutes, auto_run)
SELECT id, 'updates', true, 15, false FROM agents
UNION ALL
SELECT id, 'storage', true, 15, false FROM agents
UNION ALL
SELECT id, 'system', true, 30, false FROM agents
UNION ALL
SELECT id, 'docker', false, 15, false FROM agents;

Phase 4: UI - Agent Health Tab

Component: AgentScanners.tsx (already exists, needs enhancement)

Features needed:

  • Toggle switches for enable/disable
  • Interval dropdowns (5m, 15m, 30m, 1h)
  • Auto-run toggle
  • Last run timestamp
  • "Scan Now" button per subsystem
  • Status badges (idle, pending, running, completed, failed)

Phase 5: Scheduler

Server-side cron job:

// Every minute, check for subsystems due to run
func (s *Scheduler) CheckSubsystems() {
    subsystems := db.GetDueSubsystems(time.Now())

    for _, sub := range subsystems {
        cmd := &Command{
            AgentID: sub.AgentID,
            Type:    fmt.Sprintf("scan_%s", sub.Subsystem),
            Status:  "pending",
        }
        db.CreateCommand(cmd)

        // Update next_run_at
        sub.NextRunAt = time.Now().Add(time.Duration(sub.IntervalMinutes) * time.Minute)
        db.UpdateSubsystem(sub)
    }
}

Timeline Display Fix

Problem: History shows "System Operation" instead of "Disk Usage Reporter"

Solution: Update command result reporting to include subsystem metadata

// When reporting command results
client.ReportCommandResult(commandID, "completed", stdout, stderr, exitCode, metadata{
    "subsystem": "storage",
    "subsystem_label": "Disk Usage Reporter",
    "scan_type": "storage",
})

ChatTimeline update:

// In getNarrativeSummary()
if (entry.metadata?.subsystem_label) {
    subject = entry.metadata.subsystem_label;
} else if (entry.action === 'scan_updates') {
    subject = 'Package Updates';
} else if (entry.action === 'scan_storage') {
    subject = 'Disk Usage';
}

Windows Considerations

All subsystems must work on:

  • Linux (APT, DNF, Docker)
  • Windows (Windows Update, Winget, Docker)

Windows-specific subsystems:

  • scan_windows_services - Service monitoring
  • scan_windows_features - Optional Windows features
  • scan_event_logs - Security/Application logs (future)

Migration Path

  1. Backward Compatibility: Keep scan_updates working as-is
  2. Gradual Rollout: New agents use subsystems, old agents continue working
  3. Migration Command: Server can trigger migrate_to_subsystems command
  4. UI Toggle: "Use Legacy Scanning" checkbox in advanced settings

Testing Checklist

  • Storage scan returns proper stdout with mount points
  • System scan reports CPU/memory/processes
  • Docker scan works when Docker not installed (graceful failure)
  • Subsystem toggles persist across agent restarts
  • Auto-run schedules fire correctly
  • Manual "Scan Now" button works
  • History timeline shows correct subsystem labels
  • Windows agent supports all subsystems
  • Linux agent supports all subsystems

File Changes Required

Agent:

  • cmd/agent/main.go - Add new command handlers
  • internal/client/client.go - Add metadata support to ReportCommandResult
  • New: internal/subsystems/storage.go
  • New: internal/subsystems/system.go
  • New: internal/subsystems/docker.go

Server:

  • internal/database/migrations/013_agent_subsystems.up.sql
  • New: internal/models/subsystem.go
  • New: internal/database/queries/subsystems.go
  • New: internal/api/handlers/subsystems.go
  • New: internal/scheduler/subsystems.go

Web:

  • src/components/AgentScanners.tsx - Major enhancement
  • src/hooks/useSubsystems.ts - New API hooks
  • src/lib/api.ts - Subsystem API methods
  • src/components/ChatTimeline.tsx - Subsystem label display

Priority

v0.2.0 Must-Have:

  • Separate storage scanning command
  • Proper stdout/stderr reporting
  • History timeline labels fixed

v0.3.0 Nice-to-Have:

  • Full subsystem toggle UI
  • Auto-run scheduler
  • Per-subsystem intervals

Future:

  • Windows-specific subsystems
  • Custom subsystem plugins
  • Subsystem dependencies (e.g., Docker requires system scan)