5.5 KiB
5.5 KiB
Agent Version Management Investigation & Fix
Context
We've discovered critical issues with agent version tracking and display across the system. The version shown in the UI, stored in the database, and reported by agents are all disconnected and inconsistent.
Current Broken State
Observed Symptoms:
- UI shows: Various versions (0.1.7, maybe pulling from wrong field)
- Database
agent_versioncolumn: Stuck at 0.1.2 (never updates) - Database
current_versioncolumn: Shows 0.1.3 (default, unclear purpose) - Agent actually runs: v0.1.8 (confirmed via binary)
- Server logs show: "version 0.1.7 is up to date" (wrong baseline)
- Server config default: Hardcoded to 0.1.4 in
config/config.go:37
Known Issues:
- Conditional bug in
handlers/agents.go:135: Only updates version ifagent.Metadata != nil - Version stored in wrong places: Both database columns AND metadata JSON
- Config hardcoded default: Should be 0.1.8, is 0.1.4
- No version detection: Server doesn't detect when agent binary exists with different version
Investigation Tasks
1. Trace Version Data Flow
Map the complete flow:
- Agent binary → reports version in metrics → server receives → WHERE does it go?
- UI displays version → WHERE does it read from? (database column? metadata? API response?)
- Database has TWO version columns (
agent_version,current_version) → which is used? why both?
Questions to answer:
- What updates `agent_version` column? (Should be check-in, is broken)
- What updates `current_version` column? (Unknown)
- What does UI actually query/display?
- What is `agent.Metadata["reported_version"]` for? Redundant?
2. Identify Single Source of Truth
Design decision needed:
- Should we have ONE version column in database, or is there a reason for two?
- Should version be in both database column AND metadata JSON, or just one?
- What should happen when agent version > server's known "latest version"?
3. Fix Update Mechanism
Current broken code locations:
internal/api/handlers/agents.go:132-164- GetCommands handler with broken conditionalinternal/database/queries/agents.go:53-57- UpdateAgentVersion function (exists but not called properly)internal/config/config.go:37- Hardcoded latest version
Required fixes:
- Remove
&& agent.Metadata != nilcondition (always update version) - Decide: update
agent_versioncolumn,current_versioncolumn, or both? - Update config default to 0.1.8 (or better: auto-detect from filesystem)
4. Add Server Version Awareness (Nice-to-Have)
Enhancement: Server should detect when agents exist outside its version scope
- Scan
/usr/local/bin/redflag-agenton startup (if local) - Detect version from binary or agent check-ins
- Show notification in UI: "Agent v0.1.8 detected, but server expects v0.1.4 - update server config?"
- Could be under Settings page or as a notification banner
5. Version History (Future)
Lower priority: Track version history per agent
- Log when agent upgrades happen
- Show timeline of versions in agent history
- Useful for debugging but not critical for now
Files to Investigate
Backend:
aggregator-server/internal/api/handlers/agents.go(lines 130-165) - GetCommands version handlingaggregator-server/internal/database/queries/agents.go- UpdateAgentVersion implementationaggregator-server/internal/config/config.go(line 37) - LatestAgentVersion defaultaggregator-server/internal/database/migrations/*.sql- Check agents table schema
Frontend:
aggregator-web/src/pages/Agents.tsx- Where version is displayedaggregator-web/src/hooks/useAgents.ts- API calls for agent dataaggregator-web/src/lib/api.ts- API endpoint definitions
Database:
-- Check schema
\d agents
-- Check current data
SELECT hostname, agent_version, current_version, metadata->'reported_version'
FROM agents;
Expected Outcome
After investigation, we should have:
- Clear understanding of which fields are used and why
- Single source of truth for agent version (ideally one database column)
- Fixed update mechanism that persists version on every check-in
- Correct server config pointing to actual latest version
- Optional: Server awareness of agent versions outside its scope
Success Criteria:
- Agent v0.1.8 checks in → database immediately shows 0.1.8
- UI displays 0.1.8 correctly
- Server logs "Agent fedora version 0.1.8 is up to date"
- System works for future version bumps (0.1.9, 0.2.0, etc.)
Commands to Start Investigation
# Check database schema
docker exec redflag-postgres psql -U aggregator -d aggregator -c "\d agents"
# Check current version data
docker exec redflag-postgres psql -U aggregator -d aggregator -c "SELECT hostname, agent_version, current_version, metadata FROM agents WHERE hostname = 'fedora';"
# Check server logs for version processing
grep -E "Received metrics.*Version|UpdateAgentVersion" /tmp/redflag-server.log | tail -20
# Trace UI component rendering version
# (Will need to search codebase)
Notes
- Server is running and receiving check-ins every ~5 minutes
- Agent v0.1.8 is installed at
/usr/local/bin/redflag-agent - Built binary is at
/home/memory/Desktop/Projects/RedFlag/aggregator-agent/aggregator-agent - Database migration for retry tracking (009) was already applied
- Auto-refresh issues were FIXED (staleTime conflict resolved)
- Retry tracking features were IMPLEMENTED (works on backend, frontend needs testing)