Files
Redflag/docs/4_LOG/2025-10/Status-Updates/NEXT_SESSION_PROMPT.md

5.5 KiB

Agent Version Management Investigation & Fix

Context

We've discovered critical issues with agent version tracking and display across the system. The version shown in the UI, stored in the database, and reported by agents are all disconnected and inconsistent.

Current Broken State

Observed Symptoms:

  1. UI shows: Various versions (0.1.7, maybe pulling from wrong field)
  2. Database agent_version column: Stuck at 0.1.2 (never updates)
  3. Database current_version column: Shows 0.1.3 (default, unclear purpose)
  4. Agent actually runs: v0.1.8 (confirmed via binary)
  5. Server logs show: "version 0.1.7 is up to date" (wrong baseline)
  6. Server config default: Hardcoded to 0.1.4 in config/config.go:37

Known Issues:

  1. Conditional bug in handlers/agents.go:135: Only updates version if agent.Metadata != nil
  2. Version stored in wrong places: Both database columns AND metadata JSON
  3. Config hardcoded default: Should be 0.1.8, is 0.1.4
  4. No version detection: Server doesn't detect when agent binary exists with different version

Investigation Tasks

1. Trace Version Data Flow

Map the complete flow:

  • Agent binary → reports version in metrics → server receives → WHERE does it go?
  • UI displays version → WHERE does it read from? (database column? metadata? API response?)
  • Database has TWO version columns (agent_version, current_version) → which is used? why both?

Questions to answer:

- What updates `agent_version` column? (Should be check-in, is broken)
- What updates `current_version` column? (Unknown)
- What does UI actually query/display?
- What is `agent.Metadata["reported_version"]` for? Redundant?

2. Identify Single Source of Truth

Design decision needed:

  • Should we have ONE version column in database, or is there a reason for two?
  • Should version be in both database column AND metadata JSON, or just one?
  • What should happen when agent version > server's known "latest version"?

3. Fix Update Mechanism

Current broken code locations:

  • internal/api/handlers/agents.go:132-164 - GetCommands handler with broken conditional
  • internal/database/queries/agents.go:53-57 - UpdateAgentVersion function (exists but not called properly)
  • internal/config/config.go:37 - Hardcoded latest version

Required fixes:

  1. Remove && agent.Metadata != nil condition (always update version)
  2. Decide: update agent_version column, current_version column, or both?
  3. Update config default to 0.1.8 (or better: auto-detect from filesystem)

4. Add Server Version Awareness (Nice-to-Have)

Enhancement: Server should detect when agents exist outside its version scope

  • Scan /usr/local/bin/redflag-agent on startup (if local)
  • Detect version from binary or agent check-ins
  • Show notification in UI: "Agent v0.1.8 detected, but server expects v0.1.4 - update server config?"
  • Could be under Settings page or as a notification banner

5. Version History (Future)

Lower priority: Track version history per agent

  • Log when agent upgrades happen
  • Show timeline of versions in agent history
  • Useful for debugging but not critical for now

Files to Investigate

Backend:

  1. aggregator-server/internal/api/handlers/agents.go (lines 130-165) - GetCommands version handling
  2. aggregator-server/internal/database/queries/agents.go - UpdateAgentVersion implementation
  3. aggregator-server/internal/config/config.go (line 37) - LatestAgentVersion default
  4. aggregator-server/internal/database/migrations/*.sql - Check agents table schema

Frontend:

  1. aggregator-web/src/pages/Agents.tsx - Where version is displayed
  2. aggregator-web/src/hooks/useAgents.ts - API calls for agent data
  3. aggregator-web/src/lib/api.ts - API endpoint definitions

Database:

-- Check schema
\d agents

-- Check current data
SELECT hostname, agent_version, current_version, metadata->'reported_version'
FROM agents;

Expected Outcome

After investigation, we should have:

  1. Clear understanding of which fields are used and why
  2. Single source of truth for agent version (ideally one database column)
  3. Fixed update mechanism that persists version on every check-in
  4. Correct server config pointing to actual latest version
  5. Optional: Server awareness of agent versions outside its scope

Success Criteria:

  • Agent v0.1.8 checks in → database immediately shows 0.1.8
  • UI displays 0.1.8 correctly
  • Server logs "Agent fedora version 0.1.8 is up to date"
  • System works for future version bumps (0.1.9, 0.2.0, etc.)

Commands to Start Investigation

# Check database schema
docker exec redflag-postgres psql -U aggregator -d aggregator -c "\d agents"

# Check current version data
docker exec redflag-postgres psql -U aggregator -d aggregator -c "SELECT hostname, agent_version, current_version, metadata FROM agents WHERE hostname = 'fedora';"

# Check server logs for version processing
grep -E "Received metrics.*Version|UpdateAgentVersion" /tmp/redflag-server.log | tail -20

# Trace UI component rendering version
# (Will need to search codebase)

Notes

  • Server is running and receiving check-ins every ~5 minutes
  • Agent v0.1.8 is installed at /usr/local/bin/redflag-agent
  • Built binary is at /home/memory/Desktop/Projects/RedFlag/aggregator-agent/aggregator-agent
  • Database migration for retry tracking (009) was already applied
  • Auto-refresh issues were FIXED (staleTime conflict resolved)
  • Retry tracking features were IMPLEMENTED (works on backend, frontend needs testing)