# ISSUE #3: Scan Trigger Flow - Proper Implementation Plan

**Date**: 2025-12-18 (Planning for tomorrow)  
**Status**: Planning Phase (Ready for implementation tomorrow)  
**Severity**: High (Scan buttons currently error)  
**New Scope**: Beyond Issues #1 and #2 (completed)  

---

## Issue Summary

Individual "Scan" buttons for each subsystem (docker, storage, system, updates) all return error:
> "Failed to trigger scan: Failed to create command"

**Why**: Command acknowledgment and history logging flows are not properly integrated for subsystem-specific scans.

**What Needs to Happen**: Full ETHOS-compliant flow from UI click → API → Agent → Results → History

---

## Current State Analysis

### UI Layer (AgentHealth.tsx) ✅ WORKING
- ✅ Per-subsystem scan buttons exist
- ✅ `handleTriggerScan(subsystem.subsystem)` passes subsystem name
- `triggerScanMutation` makes API call to: `/api/v1/agents/:id/subsystems/:subsystem/trigger`

### Backend API (subsystems.go) ✅ MOSTLY WORKING
- ✅ `TriggerSubsystem` handler receives subsystem parameter
- ✅ Creates distinct command type: `commandType := "scan_" + subsystem`
- ✅ Creates AgentCommand with unique command_type
- **❌ FAILING**: `signAndCreateCommand` call fails

### Agent (main.go) ✅ MOSTLY WORKING
- ✅ `case "scan_updates":` handles update scans
- ✅ `case "scan_storage":` handles storage scans
- **❌ ISSUE**: Command acknowledgment flow needs review

### History/Reconciliation ❌ NOT INTEGRATED
- **Missing**: Subsystem context in history logging
- **Broken**: Command acknowledgment for scan commands
- **Inconsistent**: Some logs go to history, some don't

---

## Proper Implementation Requirements (ETHOS)

### Core Principles to Follow

1. **Errors are History, Not /dev/null** ✅ MUST HAVE
   - Scan failures → history table with context
   - Button click errors → history table
   - Command creation errors → history table
   - Agent handler errors → history table

2. **Security is Non-Negotiable** ✅ MUST HAVE
   - All scan triggers → authenticated endpoints (already done)
   - Command signing → Ed25519 nonces (already done)
   - Circuit breaker integration (already exists)

3. **Assume Failure; Build for Resilience** ✅ MUST HAVE
   - Scan failures → retry logic (if appropriate)
   - Command creation failures → clear error context
   - Agent unreachable → proper error to UI
   - Partial failures → handled gracefully

4. **Idempotency** ✅ MUST HAVE
   - Scan operations repeatable (safe to trigger multiple times)
   - No duplicate history entries for same scan
   - Results properly timestamped for tracking

5. **No Marketing Fluff** ✅ MUST HAVE
   - Clear action names in history: "scan_docker", "scan_storage", "scan_system"
   - Subsystem icons in history display (not just text)
   - Accurate, honest logging throughout

---

## Full Flow Design (From Click to History)

### Phase 1: User Clicks Scan Button

**UI Event**: `handleTriggerScan(subsystem.subsystem)`
```typescript
User clicks: [Scan] button on Docker row
  → handleTriggerScan("docker")
  → triggerScanMutation.mutate("docker")
  → POST /api/v1/agents/:id/subsystems/docker/trigger
```

**Ethos Requirements**:
- Button disable during pending state
- Loading indicator
- Success/error toast (already doing this)

### Phase 2: Backend Receives Trigger POST

**Handler**: `subsystems.go:TriggerSubsystem`
```go
URL: POST /api/v1/agents/:id/subsystems/:subsystem/trigger
  → Authenticate (already done)
  → Validate agent exists
  → Validate subsystem is enabled
  → Get current config
  → Generate command_id
```

**Command Creation**:
```go
command := &models.AgentCommand{
  AgentID:     agentID,
  CommandType: "scan_" + subsystem,  // "scan_docker", "scan_storage", etc.
  Status:      "pending",
  Source:      "web_ui",
  // ADD: Subsystem field for filtering/querying
  Subsystem:   subsystem,
}

// Add [HISTORY] logging
log.Printf("[HISTORY] [server] [scan] command_created agent_id=%s subsystem=%s command_id=%s timestamp=%s",
  agentID, subsystem, command.ID, time.Now().Format(time.RFC3339))

err = h.signAndCreateCommand(command)
```

**Ethos Requirements**:
- ✅ All errors logged before returning
- ✅ History entry created for command creation attempts
- ✅ Subsystem context preserved in logs

### Phase 3: Command Acknowledgment System

The scan command must flow through the standard acknowledgment system:

```go
// Already exists: pending_acks.json tracking
ackTracker.Create(command.ID, time.Now())
  → Agent checks in: receives command
  → Agent starts scan: reports status? 
  → Agent completes: reports results
  → Server updates history
  → Acknowledgment removed
```

**Current Missing Pieces**:
- Command results not being saved properly
- Subsystem context not flowing through ack system
- Scan results not creating history entries

### Phase 4: Agent Receives Scan Command

**Agent Handling**: `main.go:handleCommand`
```go
case "scan_docker":
  log.Printf("[HISTORY] [agent] [scan_docker] command_received agent_id=%s command_id=%s timestamp=%s",
    cfg.AgentID, cmd.ID, time.Now().Format(time.RFC3339))
  
  results, err := handleScanDocker(apiClient, cfg, ackTracker, scanOrchestrator, cmd.ID)
  
  if err != nil {
    log.Printf("[ERROR] [agent] [scan_docker] scan_failed error=%v timestamp=%s")
    log.Printf("[HISTORY] [agent] [scan_docker] scan_failed error="%v" timestamp=%s")
    // Update command status: failed
    // Report back via API
    // Return error
  }
  
  log.Printf("[SUCCESS] [agent] [scan_docker] scan_completed items=%d timestamp=%s")
  log.Printf("[HISTORY] [agent] [scan_docker] scan_completed items=%d timestamp=%s")
  // Update command status: success
  // Report results via API
```

**Existing Handlers**:
- `handleScanUpdatesV2` - needs review
- `handleScanStorage` - needs review
- `handleScanSystem` - needs review
- `handleScanDocker` - needs review

### Phase 5: Results Reported Back

**API Endpoint**: Agent reports scan results
```go
// POST /api/v1/agents/:id/commands/:command_id/result
{
  command_id: "...",
  result: "success",
  items_found: 4,
  stdout: "...",
  subsystem: "docker"
}
```

**Server Handler**: Updates history table
```go
// Insert into history table
INSERT INTO history (agent_id, command_id, action, result, subsystem, stdout, stderr, executed_at)
VALUES (?, ?, 'scan_docker', ?, 'docker', ?, ?, NOW())

// Add [HISTORY] logging
log.Printf("[HISTORY] [server] [scan_docker] result_logged agent_id=%s command_id=%s timestamp=%s")
```

### Phase 6: History Display

**UI Component**: `HistoryTimeline.tsx`
```typescript
// Retrieve history entries
GET /api/v1/history?agent_id=...&subsystem=docker

// Display with subsystem context
<span className="capitalize flex items-center">
  {getActionIcon(entry.action, entry.subsystem)}
  <span>{getSubsystemDisplayName(entry.subsystem)} Scan</span>
</span>

// Icons based on subsystem
getActionIcon("scan", "docker") → Docker icon
getActionIcon("scan", "storage") → Storage icon
getActionIcon("scan", "system") → System icon
```

---

## Database Changes Required

### Table: `history` (or logs)

**Add column**:
```sql
ALTER TABLE history ADD COLUMN subsystem VARCHAR(50);
CREATE INDEX idx_history_agent_action_subsystem ON history(agent_id, action, subsystem);
```

**Populate for existing scan entries**:
- Parse stdout for clues to determine subsystem
- Or set to NULL for existing entries
- UI must handle NULL (display as "Unknown Scan")

---

## Code Changes Required

### Backend (aggregator-server)

**Files to Modify**:
1. `internal/models/command.go` - Add Subsystem field
2. `internal/database/queries/commands.go` - Update for subsystem
3. `internal/api/handlers/subsystems.go` - Update TriggerSubsystem logging
4. `internal/api/handlers/commands.go` - Update command result handler
5. `internal/database/migrations/` - Add subsystem column migration

**New Queries Needed**:
```sql
-- Insert history with subsystem
INSERT INTO history (...) VALUES (..., subsystem)

-- Query history by subsystem
SELECT * FROM history WHERE agent_id = ? AND subsystem = ?
```

### Agent (aggregator-agent)

**Files to Modify**:
1. `cmd/agent/main.go` - Update all `handleScan*` functions with [HISTORY] logging
2. `internal/orchestrator/scanner.go` - Ensure wrappers pass subsystem context
3. `internal/scanner/` - Add subsystem identification to results

**Add to all scan handlers**:
```go
// Each handleScan* function needs:
// 1. [HISTORY] log when starting
// 2. [HISTORY] log on completion
// 3. [HISTORY] log on error
// 4. Subsystem context in all log messages
```

### Frontend (aggregator-web)

**Files to Modify**:
1. `src/types/index.ts` - Add subsystem to HistoryEntry interface
2. `src/components/HistoryTimeline.tsx` - Update display logic
3. `src/lib/api.ts` - Update API call to include subsystem parameter
4. `src/components/AgentHealth.tsx` - Add subsystem icons map

**Display Logic**:
```typescript
const subsystemIcon = {
  docker: <Container className="h-4 w-4" />,
  storage: <HardDrive className="h-4 w-4" />,
  system: <Cpu className="h-4 w-4" />,
  updates: <Package className="h-4 w-4" />,
  dnf: <Box className="h-4 w-4" />,
  winget: <Windows className="h-4 w-4" />,
  apt: <Linux className="h-4 w-4" />,
};

const displayName = {
  docker: 'Docker',
  storage: 'Storage',
  system: 'System',
  updates: 'Package Updates',
  // ... etc
};
```

---

## Testing Requirements

### Unit Tests
```go
// Test command creation with subsystem
TestCreateCommand_WithSubsystem()
TestCreateCommand_WithoutSubsystem()

// Test history insertion with subsystem
TestCreateHistory_WithSubsystem()
TestQueryHistory_BySubsystem()

// Test agent scan handlers
TestHandleScanDocker_LogsHistory()
TestHandleScanDocker_Failure() // Error logs to history
```

### Integration Tests
```go
// Test full flow
TestScanTrigger_FullFlow_Docker()
TestScanTrigger_FullFlow_Storage()
TestScanTrigger_FullFlow_System()
TestScanTrigger_FullFlow_Updates()

// Verify each step:
// 1. UI trigger → 2. Command created → 3. Agent receives → 4. Scan runs → 
// 5. Results reported → 6. History logged → 7. History UI displays correctly
```

### Manual Testing Checklist
- [ ] Click each subsystem scan button
- [ ] Verify scan runs and results appear
- [ ] Verify history entry created for each
- [ ] Verify history shows subsystem-specific icons and names
- [ ] Verify failed scans create history entries
- [ ] Verify command ack system tracks scan commands
- [ ] Verify circuit breakers show scan activity

---

## ETHOS Compliance Checklist

### Errors are History, Not /dev/null
- [ ] All scan errors → history table
- [ ] All scan completions → history table  
- [ ] Button click failures → history table
- [ ] Command creation failures → history table
- [ ] Agent unreachable errors → history table
- [ ] Subsystem context in all history entries

### Security is Non-Negotiable
- [ ] All scan endpoints → AuthMiddleware() (already done)
- [ ] Command signing → Ed25519 nonces (already done)
- [ ] No scan credentials in logs

### Assume Failure; Build for Resilience
- [ ] Agent unavailable → clear error to UI
- [ ] Scan timeout → properly handled
- [ ] Partial failures → reported to history
- [ ] Retry logic considered (not automatic for manual scans)

### Idempotency
- [ ] Safe to click scan multiple times
- [ ] Each scan creates distinct history entry
- [ ] No duplicate state from repeated scans

### No Marketing Fluff
- [ ] Action names: "scan_docker", "scan_storage", "scan_system"
- [ ] History display: "Docker Scan", "Storage Scan" etc.
- [ ] Subsystem-specific icons (not generic play button)
- [ ] Clear, honest logging throughout

---

## Implementation Phases

### Phase 1: Database Migration (30 min)
- Add `subsystem` column to history table
- Run migration
- Update ORM models/queries

### Phase 2: Backend API Updates (1 hour)
- Update TriggerSubsystem to log with subsystem context
- Update command result handler to include subsystem
- Update queries to handle subsystem filtering

### Phase 3: Agent Updates (1 hour)
- Add [HISTORY] logging to all scan handlers
- Ensure subsystem context flows through
- Verify error handling logs to history

### Phase 4: Frontend Updates (1 hour)
- Add subsystem to HistoryEntry type
- Add subsystem icons map
- Update display logic to show subsystem context
- Add subsystem filtering to history UI

### Phase 5: Testing (1 hour)
- Unit tests for backend changes
- Integration tests for full flow
- Manual testing of each subsystem scan

**Total Estimated Time**: 4.5 hours

---

## Risks and Considerations

**Risk 1**: Database migration on production data
- Mitigation: Test migration on backup
- Plan: Run during low-activity window

**Risk 2**: Performance impact of additional column
- Likelihood: Low (indexed, small varchar)
- Mitigation: Add index during migration

**Risk 3**: UI breaks for old entries without subsystem
- Mitigation: Handle NULL gracefully ("Unknown Scan")

---

## Planning Documents Status

This is **NEW** Issue #3 - separate from completed Issues #1 and #2.

**New Planning Documents Created**:
- `ISSUE_003_SCAN_TRIGGER_FIX.md` - This file
- `UX_ISSUE_ANALYSIS_scan_history.md` - Related UX issue (documented already)

**Update Existing**:
- `STATE_PRESERVATION.md` - Add Issue #3 tracking
- `session_2025-12-18-completion.md` - Add note about Issue #3 discovered

---

## Next Steps for Tomorrow

1. **Start of Day**: Review this plan
2. **Database**: Run migration
3. **Backend**: Update handlers and queries
4. **Agent**: Add [HISTORY] logging
5. **Frontend**: Update UI components
6. **Testing**: Verify all scan flows work
7. **Documentation**: Update completion status

---

## Sign-off

**Planning By**: Ani Tunturi (for Casey)  
**Review Status**: Ready for implementation  
**Complexity**: Medium-High (touching multiple layers)  
**Confidence**: High (follows patterns established in Issues #1-2)  

**Blood, Sweat, and Tears Commitment**: Yes - proper implementation only