Files

14 KiB

RedFlag UI and Critical Fixes - Implementation Plan

Date: 2025-11-10 Version: v0.1.23.4 → v0.1.23.5 Status: Investigation Complete, Implementation Ready


Executive Summary

Based on investigation of the three critical issues identified, here's the complete breakdown of what's happening and what needs to be fixed.


Issue #1: Scan Updates Quirk - INVESTIGATION COMPLETE

Symptoms

  • Disk/boot metrics (44% used) appearing as "approve/reject" updates in UI
  • Old monolithic logic intercepting new subsystem scanners

Investigation Results

Agent-Side: CORRECT

  • Orchestrator scanners correctly call the right endpoints:
    • Storage ScannerReportMetrics() ( Correct)
    • System ScannerReportMetrics() ( Correct)
    • Update Scanners (APT, DNF, Docker, etc.) → ReportUpdates() ( Correct)

Server-Side Handlers: CORRECT

  • ReportUpdates handler (updates.go:67) stores in update_events table
  • ReportMetrics handler (metrics.go:31) stores in metrics table
  • Both handlers properly separated and functioning

Root Cause Identified: The old monolithic handleScanUpdates function (main.go:985-1153) still exists in the codebase. While it's not currently registered in the command switch statement (which uses handleScanUpdatesV2 correctly), there are two possibilities:

  1. Old data in the database from before the subsystem refactor
  2. Windows service code (service/windows.go) uses old version constant (0.1.16) and may have different logic

Fix Required

Option A - Database Cleanup (Quick Fix):

-- Check for misclassified data
SELECT package_type, COUNT(*) as count
FROM update_events
WHERE package_type IN ('storage', 'system')
GROUP BY package_type;

-- If found, move to metrics table or delete old data

Option B - Code Cleanup (Recommended):

  1. Delete the old handleScanUpdates function (lines 985-1153 in main.go)
  2. Update Windows service version constant to match (0.1.23)
  3. Verify no other references to old function

Priority: Medium (data issue, not functional bug) Risk: Low (cleanup operation)


Issue #2: UI Version Display Missing

Current State

WebUI only shows major version (0.1.23), not full octet (0.1.23.4)

Implementation Needed

File: aggregator-web/src/pages/Dashboard.tsx

Agent Card View - Add version display:

// Add to agent card display
<AgentCard>
  ...
  <div className="agent-version">
    <span className="label">Version:</span>
    <span className="value">{agent.current_version || 'Unknown'}</span>
  </div>
</AgentCard>

Agent Details View - Add full version string:

// Add to details panel
<AgentDetails>
  ...
  <DetailRow>
    <Label>Agent Version</Label>
    <Value>{agent.current_version || agent.config_version || 'Unknown'}</Value>
  </DetailRow>
</AgentDetails>

API Data Available:

  • The backend already populates current_version field in API response
  • May need to ensure full version string (with octet) is stored and returned

Tasks

  1. Verify backend returns full version string with octet
  2. Update Agent Card to display version
  3. Update Agent Details page to display version prominently
  4. Consider adding version to agent list table view

Priority: Low (cosmetic, but important for debugging) Risk: Very Low (UI only)


Issue #3: Same-Version Installation Logic

Current Logic

// In update handler (pseudo-code)
if version < current {
    return error("downgrade not allowed")
}
// What about version == current? ❓

Use Cases

Scenario A: Agent Reinstall

  • Agent needs to reinstall same version (config corruption, binary issues)
  • Should allow: version == current

Scenario B: Accidental Update Click

  • User clicks update but agent already on that version
  • Should we allow, block, or warn?

Decision Options

Option A: Allow Same-Version (Recommended)

  • Supports reinstall scenario
  • No security risk (same version)
  • Simple implementation: change version < current to version <= current
  • Prevents unnecessary support tickets

Option B: Block Same-Version

  • Prevents no-op updates
  • May frustrate users trying to reinstall
  • Requires workaround documentation

Option C: Warning + Allow

if version == current {
    log.Printf("Warning: Agent %s already on version %s, proceeding with reinstall", agentID, version)
}
if version < current {
    return error("downgrade not allowed")
}

Implementation Location

Agent-Side: File: aggregator-agent/cmd/agent/subsystem_handlers.go Function: handleUpdateAgent() (lines 346-536)

Current version check:

// Somewhere in the update logic (needs to be added)
currentVersion := cfg.AgentVersion
targetVersion := params["version"]

if compareVersions(targetVersion, currentVersion) <= 0 {
    // Handle same version or downgrade
}

Server-Side: File: aggregator-server/internal/api/handlers/agent_build.go

Check version constraints before sending update command.

Recommendation

Option A - Allow same-version installations

Reasons:

  1. Reinstall is a valid use case
  2. No security implications
  3. Easiest to implement and document
  4. User expectation: "Update" button should work even if already on version

Tasks

  1. Define version comparison logic
  2. Add check in agent update handler (allow ==, block <)
  3. Add logging for same-version reinstalls
  4. Update UI to show appropriate messages

Priority: Low (edge case) Risk: Very Low (no security impact)


Phase 2: Middleware Version Upgrade Fix

Current Status

  • Phase 1 (Build Orchestrator): 90% complete
  • Phase 2 (Middleware): Starting

Known Issues

  1. Version Upgrade Catch-22: Middleware blocks updates due to version check
  2. Update-Aware Middleware: Need to detect upgrading agents and relax constraints
  3. Command Processing: Need complete implementation

Implementation Plan

1. Update-Aware Middleware

  • Detect when agent is in update process
  • Relax machine ID binding during upgrade
  • Restore binding after completion

2. Same-Version Logic

  • Implement decision from Issue #3 above
  • Update agent and server validation

3. End-to-End Testing

  • Test flow: 0.1.23.4 → 0.1.23.5
  • Verify signature verification
  • Validate subsystem persistence
  • Confirm agent continues operations post-update

Tasks

  1. Implement middleware version upgrade detection
  2. Add nonce validation for replay protection
  3. Implement same-version installation logic
  4. Test complete update cycle
  5. Verify signature verification

Priority: High (blocks Phase 2 completion) Risk: Medium (need to ensure security not compromised)


Build Orchestrator Status (Phase 1 - 90% Complete)

Completed

  1. Signed binary generation (build_orchestrator.go)
  2. Ed25519 signing integration (SignFile())
  3. Generic binary signing (Option 2 approach)
  4. Download handler serves signed binaries
  5. Config separation (config.json not embedded)

Remaining

  1. Agent update flow testing (0.1.23.4 → 0.1.23.5)
  2. End-to-end verification
  3. Signature verification on agent side (placeholder in place)

Ready for Cleanup

The following dead code should be removed:

  • TLSConfig struct in config.go (lines 23-29)
  • Docker artifact generation in agent_builder.go
  • Old config fields: CertFile, KeyFile, CAFile

Phase 3: Security Hardening

Tasks

  1. Remove JWT secret logging (debug mode only)
  2. Implement per-server JWT secrets (not shared)
  3. Clean dead code (TLSConfig, Docker fields)
  4. Consider kernel keyring config protection

Token Security Decision

Status: Sliding window refresh tokens are adequate

  • Machine ID binding prevents cross-machine token reuse
  • Token theft requires filesystem access (already compromised)
  • True rotation deferred to v0.3.0

Priority: Medium Risk: Low (current implementation adequate)


Testing Checklist

Agent Update Flow Test

  • Bump version to 0.1.23.5
  • Build signed binary for 0.1.23.5
  • Test update from 0.1.23.4 → 0.1.23.5
  • Verify signature verification works
  • Confirm agent restarts successfully
  • Validate subsystems still enabled post-update
  • Verify metrics still reporting correctly
  • Check update_events table for corruption

UI Display Test

  • Version shows on agent card
  • Version shows on agent details page
  • Version updates after agent update

Subsystem Tests

  • Storage scan reports to metrics table
  • System scan reports to metrics table
  • APT scan reports to update_events table
  • Docker scan reports to update_events table

Database Queries for Investigation

Check for Misclassified Data

-- Query 1: Check for storage/system data in update_events
SELECT package_type, COUNT(*) as count
FROM update_events
WHERE package_type IN ('storage', 'system', 'disk', 'boot')
GROUP BY package_type;

-- Query 2: Check metrics table for package update data
SELECT package_type, COUNT(*) as count
FROM metrics
WHERE package_type IN ('apt', 'dnf', 'docker', 'windows', 'winget')
GROUP BY package_type;

-- Query 3: Check agent_subsystems configuration
SELECT name, enabled, auto_run
FROM agent_subsystems
WHERE name IN ('storage', 'system', 'updates');

Cleanup Queries (If Needed)

-- Move or delete misclassified data
-- BACKUP FIRST!

-- Check how many records
SELECT COUNT(*) FROM update_events
WHERE package_type = 'storage';

-- Delete (or move to metrics table)
DELETE FROM update_events
WHERE package_type IN ('storage', 'system')
AND created_at < NOW() - INTERVAL '7 days';

Code Locations Reference

Agent-Side

  • aggregator-agent/cmd/agent/main.go - Command routing (line 864-882)
  • aggregator-agent/cmd/agent/subsystem_handlers.go - Scan handlers
  • aggregator-agent/cmd/agent/main.go:985 - OLD handleScanUpdates (delete)
  • aggregator-agent/internal/service/windows.go:32 - Old version constant (update)

API Handlers

  • aggregator-server/internal/api/handlers/updates.go:67 - ReportUpdates
  • aggregator-server/internal/api/handlers/metrics.go:31 - ReportMetrics
  • aggregator-server/internal/api/handlers/agent_build.go - Update logic

WebUI

  • aggregator-web/src/pages/Dashboard.tsx - Agent card and details
  • aggregator-web/src/pages/settings/AgentManagement.tsx - Version display

Database Tables

  • update_events - Package updates (apt, dnf, docker, etc.)
  • metrics - System metrics (storage, system, cpu, memory)
  • agent_subsystems - Subsystem configuration

Week 1 (Critical Fixes)

  1. Database Investigation - Run queries to check for misclassified data
  2. UI Version Display - Add version to agent cards and details (easy win)
  3. Same-Version Logic Decision - Make decision and implement
  4. Test Update Flow - 0.1.23.4 → 0.1.23.5

Week 2 (Phase 2 Completion)

  1. Middleware Version Upgrade - Implement detection logic
  2. Security Hardening - JWT logging, per-server secrets
  3. Code Cleanup - Remove old handleScanUpdates function
  4. Documentation - Update all docs for v0.2.0

Week 3 (Polish)

  1. Token Rotation (Nice-to-have) - Implement true rotation
  2. Enhanced UI - Improve metrics display
  3. Testing - Full integration test suite

Risk Assessment

Issue Priority Risk Effort
Scan Updates Quirk Medium Low 2 hours
UI Version Display Low Very Low 1 hour
Same-Version Logic Low Very Low 1 hour
Middleware Upgrade High Medium 4 hours
Agent Update Test High Medium 3 hours
Security Hardening Medium Low 4 hours

Decision Log

Decision 1: Same-Version Installations

Status: Pending Options: Allow / Block / Warn Recommendation: Allow (supports reinstall use case)

Decision 2: Token Rotation Priority

Status: Defer to v0.3.0 Rationale: Machine ID binding provides adequate security Decision: Defer - sliding window sufficient

Decision 3: UI Version Display Location

Status: Pending Options: Card only / Details only / Both Recommendation: Both for maximum visibility

Decision 4: Scan Updates Fix Approach

Status: Pending Options: Database cleanup / Code cleanup Recommendation: Both - cleanup old data AND remove dead code


Next Steps

Immediate (Today)

  1. ☐ Check database for misclassified data using queries above
  2. ☐ Make decisions on Same-Version logic (Allow/Block)
  3. ☐ Decide on token rotation (now vs defer)
  4. ☐ Run test update flow

This Week

  1. ☐ Implement UI version display
  2. ☐ Implement same-version installation logic
  3. ☐ Complete middleware version upgrade
  4. ☐ Remove JWT secret logging

Next Week

  1. ☐ Full integration testing
  2. ☐ Update documentation
  3. ☐ Prepare v0.2.0 release

Notes

Build Orchestrator Misalignment - RESOLVED

  • Originally generating Docker configs, installer expecting native binaries
  • Fixed: Now generates signed native binaries per version/platform
  • Signed packages stored in database
  • Download endpoint serves correct binaries

Version Upgrade Catch-22 - IN PROGRESS ⚠️

  • Middleware blocks updates due to machine ID binding
  • Need update-aware middleware to detect upgrading agents
  • Nonce validation needed for replay protection

Token Security - ADEQUATE

  • Sliding window refresh tokens sufficient
  • Machine ID binding prevents cross-machine token reuse
  • True rotation nice-to-have but not critical for v0.2.0

Document Version: 1.0 Last Updated: 2025-11-10 Next Review: After critical fixes completed Owner: @Fimeg Collaborator: Kimi-k2 (Infrastructure Analysis)