# ISSUE #3: Scan Trigger Flow - Proper Implementation Plan
**Date**: 2025-12-18 (Planning for tomorrow)
**Status**: Planning Phase (Ready for implementation tomorrow)
**Severity**: High (Scan buttons currently error)
**New Scope**: Beyond Issues #1 and #2 (completed)
---
## Issue Summary
Individual "Scan" buttons for each subsystem (docker, storage, system, updates) all return error:
> "Failed to trigger scan: Failed to create command"
**Why**: Command acknowledgment and history logging flows are not properly integrated for subsystem-specific scans.
**What Needs to Happen**: Full ETHOS-compliant flow from UI click → API → Agent → Results → History
---
## Current State Analysis
### UI Layer (AgentHealth.tsx) ✅ WORKING
- ✅ Per-subsystem scan buttons exist
- ✅ `handleTriggerScan(subsystem.subsystem)` passes subsystem name
- `triggerScanMutation` makes API call to: `/api/v1/agents/:id/subsystems/:subsystem/trigger`
### Backend API (subsystems.go) ✅ MOSTLY WORKING
- ✅ `TriggerSubsystem` handler receives subsystem parameter
- ✅ Creates distinct command type: `commandType := "scan_" + subsystem`
- ✅ Creates AgentCommand with unique command_type
- **❌ FAILING**: `signAndCreateCommand` call fails
### Agent (main.go) ✅ MOSTLY WORKING
- ✅ `case "scan_updates":` handles update scans
- ✅ `case "scan_storage":` handles storage scans
- **❌ ISSUE**: Command acknowledgment flow needs review
### History/Reconciliation ❌ NOT INTEGRATED
- **Missing**: Subsystem context in history logging
- **Broken**: Command acknowledgment for scan commands
- **Inconsistent**: Some logs go to history, some don't
---
## Proper Implementation Requirements (ETHOS)
### Core Principles to Follow
1. **Errors are History, Not /dev/null** ✅ MUST HAVE
- Scan failures → history table with context
- Button click errors → history table
- Command creation errors → history table
- Agent handler errors → history table
2. **Security is Non-Negotiable** ✅ MUST HAVE
- All scan triggers → authenticated endpoints (already done)
- Command signing → Ed25519 nonces (already done)
- Circuit breaker integration (already exists)
3. **Assume Failure; Build for Resilience** ✅ MUST HAVE
- Scan failures → retry logic (if appropriate)
- Command creation failures → clear error context
- Agent unreachable → proper error to UI
- Partial failures → handled gracefully
4. **Idempotency** ✅ MUST HAVE
- Scan operations repeatable (safe to trigger multiple times)
- No duplicate history entries for same scan
- Results properly timestamped for tracking
5. **No Marketing Fluff** ✅ MUST HAVE
- Clear action names in history: "scan_docker", "scan_storage", "scan_system"
- Subsystem icons in history display (not just text)
- Accurate, honest logging throughout
---
## Full Flow Design (From Click to History)
### Phase 1: User Clicks Scan Button
**UI Event**: `handleTriggerScan(subsystem.subsystem)`
```typescript
User clicks: [Scan] button on Docker row
→ handleTriggerScan("docker")
→ triggerScanMutation.mutate("docker")
→ POST /api/v1/agents/:id/subsystems/docker/trigger
```
**Ethos Requirements**:
- Button disable during pending state
- Loading indicator
- Success/error toast (already doing this)
### Phase 2: Backend Receives Trigger POST
**Handler**: `subsystems.go:TriggerSubsystem`
```go
URL: POST /api/v1/agents/:id/subsystems/:subsystem/trigger
→ Authenticate (already done)
→ Validate agent exists
→ Validate subsystem is enabled
→ Get current config
→ Generate command_id
```
**Command Creation**:
```go
command := &models.AgentCommand{
AgentID: agentID,
CommandType: "scan_" + subsystem, // "scan_docker", "scan_storage", etc.
Status: "pending",
Source: "web_ui",
// ADD: Subsystem field for filtering/querying
Subsystem: subsystem,
}
// Add [HISTORY] logging
log.Printf("[HISTORY] [server] [scan] command_created agent_id=%s subsystem=%s command_id=%s timestamp=%s",
agentID, subsystem, command.ID, time.Now().Format(time.RFC3339))
err = h.signAndCreateCommand(command)
```
**Ethos Requirements**:
- ✅ All errors logged before returning
- ✅ History entry created for command creation attempts
- ✅ Subsystem context preserved in logs
### Phase 3: Command Acknowledgment System
The scan command must flow through the standard acknowledgment system:
```go
// Already exists: pending_acks.json tracking
ackTracker.Create(command.ID, time.Now())
→ Agent checks in: receives command
→ Agent starts scan: reports status?
→ Agent completes: reports results
→ Server updates history
→ Acknowledgment removed
```
**Current Missing Pieces**:
- Command results not being saved properly
- Subsystem context not flowing through ack system
- Scan results not creating history entries
### Phase 4: Agent Receives Scan Command
**Agent Handling**: `main.go:handleCommand`
```go
case "scan_docker":
log.Printf("[HISTORY] [agent] [scan_docker] command_received agent_id=%s command_id=%s timestamp=%s",
cfg.AgentID, cmd.ID, time.Now().Format(time.RFC3339))
results, err := handleScanDocker(apiClient, cfg, ackTracker, scanOrchestrator, cmd.ID)
if err != nil {
log.Printf("[ERROR] [agent] [scan_docker] scan_failed error=%v timestamp=%s")
log.Printf("[HISTORY] [agent] [scan_docker] scan_failed error="%v" timestamp=%s")
// Update command status: failed
// Report back via API
// Return error
}
log.Printf("[SUCCESS] [agent] [scan_docker] scan_completed items=%d timestamp=%s")
log.Printf("[HISTORY] [agent] [scan_docker] scan_completed items=%d timestamp=%s")
// Update command status: success
// Report results via API
```
**Existing Handlers**:
- `handleScanUpdatesV2` - needs review
- `handleScanStorage` - needs review
- `handleScanSystem` - needs review
- `handleScanDocker` - needs review
### Phase 5: Results Reported Back
**API Endpoint**: Agent reports scan results
```go
// POST /api/v1/agents/:id/commands/:command_id/result
{
command_id: "...",
result: "success",
items_found: 4,
stdout: "...",
subsystem: "docker"
}
```
**Server Handler**: Updates history table
```go
// Insert into history table
INSERT INTO history (agent_id, command_id, action, result, subsystem, stdout, stderr, executed_at)
VALUES (?, ?, 'scan_docker', ?, 'docker', ?, ?, NOW())
// Add [HISTORY] logging
log.Printf("[HISTORY] [server] [scan_docker] result_logged agent_id=%s command_id=%s timestamp=%s")
```
### Phase 6: History Display
**UI Component**: `HistoryTimeline.tsx`
```typescript
// Retrieve history entries
GET /api/v1/history?agent_id=...&subsystem=docker
// Display with subsystem context
{getActionIcon(entry.action, entry.subsystem)}
{getSubsystemDisplayName(entry.subsystem)} Scan
// Icons based on subsystem
getActionIcon("scan", "docker") → Docker icon
getActionIcon("scan", "storage") → Storage icon
getActionIcon("scan", "system") → System icon
```
---
## Database Changes Required
### Table: `history` (or logs)
**Add column**:
```sql
ALTER TABLE history ADD COLUMN subsystem VARCHAR(50);
CREATE INDEX idx_history_agent_action_subsystem ON history(agent_id, action, subsystem);
```
**Populate for existing scan entries**:
- Parse stdout for clues to determine subsystem
- Or set to NULL for existing entries
- UI must handle NULL (display as "Unknown Scan")
---
## Code Changes Required
### Backend (aggregator-server)
**Files to Modify**:
1. `internal/models/command.go` - Add Subsystem field
2. `internal/database/queries/commands.go` - Update for subsystem
3. `internal/api/handlers/subsystems.go` - Update TriggerSubsystem logging
4. `internal/api/handlers/commands.go` - Update command result handler
5. `internal/database/migrations/` - Add subsystem column migration
**New Queries Needed**:
```sql
-- Insert history with subsystem
INSERT INTO history (...) VALUES (..., subsystem)
-- Query history by subsystem
SELECT * FROM history WHERE agent_id = ? AND subsystem = ?
```
### Agent (aggregator-agent)
**Files to Modify**:
1. `cmd/agent/main.go` - Update all `handleScan*` functions with [HISTORY] logging
2. `internal/orchestrator/scanner.go` - Ensure wrappers pass subsystem context
3. `internal/scanner/` - Add subsystem identification to results
**Add to all scan handlers**:
```go
// Each handleScan* function needs:
// 1. [HISTORY] log when starting
// 2. [HISTORY] log on completion
// 3. [HISTORY] log on error
// 4. Subsystem context in all log messages
```
### Frontend (aggregator-web)
**Files to Modify**:
1. `src/types/index.ts` - Add subsystem to HistoryEntry interface
2. `src/components/HistoryTimeline.tsx` - Update display logic
3. `src/lib/api.ts` - Update API call to include subsystem parameter
4. `src/components/AgentHealth.tsx` - Add subsystem icons map
**Display Logic**:
```typescript
const subsystemIcon = {
docker: ,
storage: ,
system: ,
updates: ,
dnf: ,
winget: ,
apt: ,
};
const displayName = {
docker: 'Docker',
storage: 'Storage',
system: 'System',
updates: 'Package Updates',
// ... etc
};
```
---
## Testing Requirements
### Unit Tests
```go
// Test command creation with subsystem
TestCreateCommand_WithSubsystem()
TestCreateCommand_WithoutSubsystem()
// Test history insertion with subsystem
TestCreateHistory_WithSubsystem()
TestQueryHistory_BySubsystem()
// Test agent scan handlers
TestHandleScanDocker_LogsHistory()
TestHandleScanDocker_Failure() // Error logs to history
```
### Integration Tests
```go
// Test full flow
TestScanTrigger_FullFlow_Docker()
TestScanTrigger_FullFlow_Storage()
TestScanTrigger_FullFlow_System()
TestScanTrigger_FullFlow_Updates()
// Verify each step:
// 1. UI trigger → 2. Command created → 3. Agent receives → 4. Scan runs →
// 5. Results reported → 6. History logged → 7. History UI displays correctly
```
### Manual Testing Checklist
- [ ] Click each subsystem scan button
- [ ] Verify scan runs and results appear
- [ ] Verify history entry created for each
- [ ] Verify history shows subsystem-specific icons and names
- [ ] Verify failed scans create history entries
- [ ] Verify command ack system tracks scan commands
- [ ] Verify circuit breakers show scan activity
---
## ETHOS Compliance Checklist
### Errors are History, Not /dev/null
- [ ] All scan errors → history table
- [ ] All scan completions → history table
- [ ] Button click failures → history table
- [ ] Command creation failures → history table
- [ ] Agent unreachable errors → history table
- [ ] Subsystem context in all history entries
### Security is Non-Negotiable
- [ ] All scan endpoints → AuthMiddleware() (already done)
- [ ] Command signing → Ed25519 nonces (already done)
- [ ] No scan credentials in logs
### Assume Failure; Build for Resilience
- [ ] Agent unavailable → clear error to UI
- [ ] Scan timeout → properly handled
- [ ] Partial failures → reported to history
- [ ] Retry logic considered (not automatic for manual scans)
### Idempotency
- [ ] Safe to click scan multiple times
- [ ] Each scan creates distinct history entry
- [ ] No duplicate state from repeated scans
### No Marketing Fluff
- [ ] Action names: "scan_docker", "scan_storage", "scan_system"
- [ ] History display: "Docker Scan", "Storage Scan" etc.
- [ ] Subsystem-specific icons (not generic play button)
- [ ] Clear, honest logging throughout
---
## Implementation Phases
### Phase 1: Database Migration (30 min)
- Add `subsystem` column to history table
- Run migration
- Update ORM models/queries
### Phase 2: Backend API Updates (1 hour)
- Update TriggerSubsystem to log with subsystem context
- Update command result handler to include subsystem
- Update queries to handle subsystem filtering
### Phase 3: Agent Updates (1 hour)
- Add [HISTORY] logging to all scan handlers
- Ensure subsystem context flows through
- Verify error handling logs to history
### Phase 4: Frontend Updates (1 hour)
- Add subsystem to HistoryEntry type
- Add subsystem icons map
- Update display logic to show subsystem context
- Add subsystem filtering to history UI
### Phase 5: Testing (1 hour)
- Unit tests for backend changes
- Integration tests for full flow
- Manual testing of each subsystem scan
**Total Estimated Time**: 4.5 hours
---
## Risks and Considerations
**Risk 1**: Database migration on production data
- Mitigation: Test migration on backup
- Plan: Run during low-activity window
**Risk 2**: Performance impact of additional column
- Likelihood: Low (indexed, small varchar)
- Mitigation: Add index during migration
**Risk 3**: UI breaks for old entries without subsystem
- Mitigation: Handle NULL gracefully ("Unknown Scan")
---
## Planning Documents Status
This is **NEW** Issue #3 - separate from completed Issues #1 and #2.
**New Planning Documents Created**:
- `ISSUE_003_SCAN_TRIGGER_FIX.md` - This file
- `UX_ISSUE_ANALYSIS_scan_history.md` - Related UX issue (documented already)
**Update Existing**:
- `STATE_PRESERVATION.md` - Add Issue #3 tracking
- `session_2025-12-18-completion.md` - Add note about Issue #3 discovered
---
## Next Steps for Tomorrow
1. **Start of Day**: Review this plan
2. **Database**: Run migration
3. **Backend**: Update handlers and queries
4. **Agent**: Add [HISTORY] logging
5. **Frontend**: Update UI components
6. **Testing**: Verify all scan flows work
7. **Documentation**: Update completion status
---
## Sign-off
**Planning By**: Ani Tunturi (for Casey)
**Review Status**: Ready for implementation
**Complexity**: Medium-High (touching multiple layers)
**Confidence**: High (follows patterns established in Issues #1-2)
**Blood, Sweat, and Tears Commitment**: Yes - proper implementation only