Fimeg/Redflag

Fork 0

Files

Fimeg 484a7f77ce Add docs and project files - force for Culurien

2026-03-28 20:46:24 -04:00

9.1 KiB

Raw Permalink Blame History

Agent Subsystem Scanning - Implementation Plan

Current State (Problems)

Monolithic Scanning: Everything runs in one scan_updates command
- Storage metrics
- Update scanning (APT/DNF/Winget/Windows Update/Docker)
- System info collection
- Process info
No Granular Control: Can't disable individual subsystems
Poor Logging: History shows "System Operation" instead of specific subsystem names
No Schedule Tracking: Subsystems claim 15m intervals but don't actually follow them
No stdout/stderr Reporting: Refresh commands don't report detailed output

Proposed Architecture

New Command Types

Current:  scan_updates (does everything)

New:
- scan_updates          # Just package updates
- scan_storage          # Disk usage only
- scan_system           # CPU, memory, processes, uptime
- scan_docker           # Docker containers/images
- heartbeat             # Rapid polling check-in

Agent Subsystem Config

type SubsystemConfig struct {
    Enabled      bool
    Interval     time.Duration  // How often to auto-run
    LastRun      time.Time
    AutoRun      bool           // Server-initiated vs agent-initiated
}

type AgentSubsystems struct {
    Updates      SubsystemConfig  // scan_updates
    Storage      SubsystemConfig  // scan_storage
    SystemInfo   SubsystemConfig  // scan_system
    Docker       SubsystemConfig  // scan_docker
    Heartbeat    SubsystemConfig  // heartbeat
}

Server-Side Subsystem Tracking

Database Schema Addition:

CREATE TABLE agent_subsystems (
    agent_id UUID REFERENCES agents(id),
    subsystem VARCHAR(50),  -- 'storage', 'updates', 'system', 'docker'
    enabled BOOLEAN DEFAULT true,
    interval_minutes INTEGER DEFAULT 15,
    auto_run BOOLEAN DEFAULT false,  -- Server-scheduled vs on-demand
    last_run_at TIMESTAMP,
    next_run_at TIMESTAMP,
    PRIMARY KEY (agent_id, subsystem)
);

UI Toggle Structure (Agent Health Tab)

Agent Health Subsystems
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

□ Package Updates
  Scans for available package updates
  [scan_updates] [completed] [ON] [15m] [2 min ago] [Auto]

□ Disk Usage Reporter
  Reports disk usage metrics to server
  [storage] [completed] [ON] [15m] [10 min ago] [Auto]

□ System Metrics
  CPU, memory, process count, uptime
  [system] [completed] [ON] [30m] [5 min ago] [Manual]

□ Docker Monitoring
  Container and image update tracking
  [docker] [idle] [OFF] [-] [never] [-]

□ Heartbeat
  Rapid status check-in (5s polling)
  [heartbeat] [active] [ON] [Permanent] [2s ago] [Manual]

Implementation Steps

Phase 1: Backend - New Command Types

File: aggregator-agent/cmd/agent/main.go

// Add new command handlers
case "scan_storage":
    handleScanStorage(apiClient, cfg, cmd.ID)

case "scan_system":
    handleScanSystem(apiClient, cfg, cmd.ID)

case "scan_docker":
    handleScanDocker(apiClient, cfg, dockerScanner, cmd.ID)

New Handlers:

func handleScanStorage(client *client.Client, cfg *config.Config, commandID string) error {
    // Collect disk info only
    systemInfo, err := system.GetSystemInfo(AgentVersion)

    stdout := fmt.Sprintf("Disk scan completed\n")
    stdout += fmt.Sprintf("Found %d mount points\n", len(systemInfo.DiskInfo))
    for _, disk := range systemInfo.DiskInfo {
        stdout += fmt.Sprintf("- %s: %.1f%% used (%s / %s)\n",
            disk.Mountpoint, disk.UsedPercent,
            formatBytes(disk.Used), formatBytes(disk.Total))
    }

    return client.ReportCommandResult(commandID, "completed", stdout, "", 0)
}

Phase 2: Server - Subsystem API

New Endpoints:

POST   /api/v1/agents/:id/subsystems/:name/enable
POST   /api/v1/agents/:id/subsystems/:name/disable
POST   /api/v1/agents/:id/subsystems/:name/trigger
GET    /api/v1/agents/:id/subsystems
PATCH  /api/v1/agents/:id/subsystems/:name

Example Request:

PATCH /api/v1/agents/uuid/subsystems/storage
{
  "enabled": true,
  "interval_minutes": 15,
  "auto_run": false
}

Phase 3: Database Migration

File: aggregator-server/internal/database/migrations/013_agent_subsystems.up.sql

CREATE TABLE IF NOT EXISTS agent_subsystems (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    agent_id UUID NOT NULL REFERENCES agents(id) ON DELETE CASCADE,
    subsystem VARCHAR(50) NOT NULL,
    enabled BOOLEAN DEFAULT true,
    interval_minutes INTEGER DEFAULT 15,
    auto_run BOOLEAN DEFAULT false,
    last_run_at TIMESTAMP,
    next_run_at TIMESTAMP,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    UNIQUE(agent_id, subsystem)
);

CREATE INDEX idx_agent_subsystems_agent ON agent_subsystems(agent_id);
CREATE INDEX idx_agent_subsystems_next_run ON agent_subsystems(next_run_at)
    WHERE enabled = true AND auto_run = true;

-- Default subsystems for existing agents
INSERT INTO agent_subsystems (agent_id, subsystem, enabled, interval_minutes, auto_run)
SELECT id, 'updates', true, 15, false FROM agents
UNION ALL
SELECT id, 'storage', true, 15, false FROM agents
UNION ALL
SELECT id, 'system', true, 30, false FROM agents
UNION ALL
SELECT id, 'docker', false, 15, false FROM agents;

Phase 4: UI - Agent Health Tab

Component: AgentScanners.tsx (already exists, needs enhancement)

Features needed:

Toggle switches for enable/disable
Interval dropdowns (5m, 15m, 30m, 1h)
Auto-run toggle
Last run timestamp
"Scan Now" button per subsystem
Status badges (idle, pending, running, completed, failed)

Phase 5: Scheduler

Server-side cron job:

// Every minute, check for subsystems due to run
func (s *Scheduler) CheckSubsystems() {
    subsystems := db.GetDueSubsystems(time.Now())

    for _, sub := range subsystems {
        cmd := &Command{
            AgentID: sub.AgentID,
            Type:    fmt.Sprintf("scan_%s", sub.Subsystem),
            Status:  "pending",
        }
        db.CreateCommand(cmd)

        // Update next_run_at
        sub.NextRunAt = time.Now().Add(time.Duration(sub.IntervalMinutes) * time.Minute)
        db.UpdateSubsystem(sub)
    }
}

Timeline Display Fix

Problem: History shows "System Operation" instead of "Disk Usage Reporter"

Solution: Update command result reporting to include subsystem metadata

// When reporting command results
client.ReportCommandResult(commandID, "completed", stdout, stderr, exitCode, metadata{
    "subsystem": "storage",
    "subsystem_label": "Disk Usage Reporter",
    "scan_type": "storage",
})

ChatTimeline update:

// In getNarrativeSummary()
if (entry.metadata?.subsystem_label) {
    subject = entry.metadata.subsystem_label;
} else if (entry.action === 'scan_updates') {
    subject = 'Package Updates';
} else if (entry.action === 'scan_storage') {
    subject = 'Disk Usage';
}

Windows Considerations

All subsystems must work on:

✅ Linux (APT, DNF, Docker)
✅ Windows (Windows Update, Winget, Docker)

Windows-specific subsystems:

scan_windows_services - Service monitoring
scan_windows_features - Optional Windows features
scan_event_logs - Security/Application logs (future)

Migration Path

Backward Compatibility: Keep scan_updates working as-is
Gradual Rollout: New agents use subsystems, old agents continue working
Migration Command: Server can trigger migrate_to_subsystems command
UI Toggle: "Use Legacy Scanning" checkbox in advanced settings

Testing Checklist

Storage scan returns proper stdout with mount points
System scan reports CPU/memory/processes
Docker scan works when Docker not installed (graceful failure)
Subsystem toggles persist across agent restarts
Auto-run schedules fire correctly
Manual "Scan Now" button works
History timeline shows correct subsystem labels
Windows agent supports all subsystems
Linux agent supports all subsystems

File Changes Required

Agent:

cmd/agent/main.go - Add new command handlers
internal/client/client.go - Add metadata support to ReportCommandResult
New: internal/subsystems/storage.go
New: internal/subsystems/system.go
New: internal/subsystems/docker.go

Server:

internal/database/migrations/013_agent_subsystems.up.sql
New: internal/models/subsystem.go
New: internal/database/queries/subsystems.go
New: internal/api/handlers/subsystems.go
New: internal/scheduler/subsystems.go

Web:

src/components/AgentScanners.tsx - Major enhancement
src/hooks/useSubsystems.ts - New API hooks
src/lib/api.ts - Subsystem API methods
src/components/ChatTimeline.tsx - Subsystem label display

Priority

v0.2.0 Must-Have:

Separate storage scanning command
Proper stdout/stderr reporting
History timeline labels fixed

v0.3.0 Nice-to-Have:

Full subsystem toggle UI
Auto-run scheduler
Per-subsystem intervals

Future:

Windows-specific subsystems
Custom subsystem plugins
Subsystem dependencies (e.g., Docker requires system scan)

9.1 KiB Raw Permalink Blame History