11 KiB
FUCKED.md - Agent Install/Registration Flow Analysis
Status: Complete breakdown of agent registration and version tracking bugs as of 2025-11-13
The Complete Failure Chain
Issue 1: Version Tracking Not Updated During Token Renewal (Server-Side)
Root Cause: The MachineBindingMiddleware checks agent version before token renewal can update it.
File: aggregator-server/internal/api/handlers/agents.go:167
Flow:
POST /api/v1/agents/:id/commands
↓
[MachineBindingMiddleware] ← Checks version here (line ~75)
- Loads agent from DB
- Sees current_version = "0.1.17"
- REJECTS with 426 ← Request never reaches handler
↓
[AgentHandler.GetCommands] ← Would update version here (line 239)
- But never gets called
Fix Attempted:
- Added
AgentVersionfield toTokenRenewalRequest - Modified
RenewTokento callUpdateAgentVersion() - Problem: Token renewal happens AFTER 426 rejection
Why It Failed: The agent gets 426 → Must get commands to send version → Can't get commands because 426 → Deadlock
Issue 2: Install Script Does NOT Actually Register Agents
Root Cause: The install script creates a blank config template instead of calling the registration API.
Files Affected:
aggregator-server/internal/api/handlers/downloads.go:343aggregator-server/internal/services/install_templates.go(likely)
Current Broken Flow:
// In downloads.go line 343
configTemplate := map[string]interface{}{
"agent_id": "00000000-0000-0000-0000-000000000000", // PLACEHOLDER!
"token": "", // EMPTY
"refresh_token": "", // EMPTY
"registration_token": "", // EMPTY
}
What Should Happen:
1. curl installer | bash -s -- <registration_token>
2. Download agent binary ✓
3. Call POST /api/v1/agents/register with token ✗ MISSING
4. Get credentials back ✗ MISSING
5. Write to config.json ✗ Writes template instead
6. Start service ✓
7. Service fails: "Agent not registered" ✗
The install script generateInstallScript:
- Receives registration token as parameter
- Never uses it to call registration API
- Generates config with empty placeholders
- Agent starts, finds no credentials, exits
Historical Context: This was probably written when agents could self-register on first start. When registration tokens were added, the installer was never updated to actually perform the registration.
Issue 3: Middleware Version Check Happens Too Early
Root Cause: Version check in middleware prevents handler from updating version.
File: aggregator-server/internal/api/middleware/auth.go (assumed location)
Middleware Chain:
GET /api/v1/agents/:id/commands
↓
[MachineBindingMiddleware] ← Version check here (line ~75)
- agent = GetAgentByMachineID()
- if version < min → 426
↓
[AuthMiddleware] ← Auth check here
↓
[AgentHandler.GetCommands] ← Would update version here (line 239)
- UpdateAgentVersion(agentID, metrics.Version)
The Paradox:
- Need to reach handler to update version
- Can't reach handler because version is old
- Can't update version because can't reach handler
Fix Required: Version must be updated during token renewal or registration, NOT during check-in.
Issue 4: Agent Version Field Confusion
Database Schema:
CREATE TABLE agents (
agent_version VARCHAR(50), -- Version at registration (static)
current_version VARCHAR(50), -- Current running version (dynamic)
...
);
Current Queries:
UpdateAgentVersion()updatescurrent_version✓- But middleware might check
agent_version✗ - Fields have overlapping purposes
Evidence:
- Agent registers as 0.1.17 →
agent_version= 0.1.17 - Agent upgrades to 0.1.23.6 →
current_versionshould update to 0.1.23.6 - But if middleware checks
agent_version, it sees 0.1.17 → 426 rejection
Check This:
SELECT agent_version, current_version FROM agents WHERE id = 'agent-id';
-- If agent_version != current_version, middleware is checking wrong field
Issue 5: Token Renewal Timing Problem
Expected Flow:
Agent check-in (v0.1.23.6 binary)
↓
401 Unauthorized (old token)
↓
RenewToken(agentID, refreshToken, "0.1.23.6")
↓
Server updates DB: current_version = "0.1.23.6"
↓
Server returns new access token
↓
Agent retries check-in with new token
↓
MachineBindingMiddleware sees current_version = "0.1.23.6"
↓
Accepts request!
Actual Flow:
Agent check-in (v0.1.23.6 binary)
↓
426 Upgrade Required (before auth!)
↓
Agent NEVER reaches 401 renewal path
↓
Deadlock
The Order Is Wrong: Middleware checks version BEFORE checking if token is expired. Should be:
- Check if token valid (expired?)
- If expired, allow renewal to update version
- Then check version
Git History Investigation Guide
Find Working Version History:
# Check when download handler last worked
git log -p -- aggregator-server/internal/api/handlers/downloads.go | grep -A20 "registration" | head -50
# Check install template service
git log -p -- aggregator-server/internal/services/install_template_service.go
# Check middleware version check implementation
git log -p -- aggregator-server/internal/api/middleware/auth.go | grep -A10 "version"
# Check when TokenRenewal first added AgentVersion
git log -p -- aggregator-server/internal/models/agent.go | grep -B5 -A5 "AgentVersion"
Find Old Working Installer:
# Look for commits before machine_id was added (pre-0.1.22)
git log --oneline --before="2024-11-01" | head -20
# Checkout old version to see working installer
git checkout v0.1.16
# Study: aggregator-server/internal/api/handlers/downloads.go
git checkout main
Key Commits to Investigate:
git log --grep="install" --grep="template" --onelinegit log --grep="registration" --grep="token" --onelinegit log --grep="machine" --grep="binding" --onelinegit log --grep="version" --grep="current" --oneline
Files Adjacent to Downloads.go That Probably Need Checking:
-
aggregator-server/internal/services/install_template_service.go- Likely contains the actual template generation
- May have had registration logic removed
-
aggregator-server/internal/api/middleware/auth.go- Contains MachineBindingMiddleware
- Version check logic
-
aggregator-server/internal/api/handlers/agent_build.go- May have old registration endpoint implementations
-
aggregator-server/internal/services/config_builder.go- May have install-time config generation logic
-
aggregator-server/cmd/server/main.go- Middleware registration order
Quick Fixes That Might Work:
Fix 1: Make Install Script Actually Register
// In downloads.go generateInstallScript()
// Instead of creating template with placeholders,
// call registration API from within the bash script
script += fmt.Sprintf(`
# Actually register the agent
REG_RESPONSE=$(curl -s -X POST %s/api/v1/agents/register \
-H "Authorization: Bearer %s" \
-d '{"hostname": "$(hostname)", ...}')
# Extract credentials
AGENT_ID=$(echo $REG_RESPONSE | jq -r '.agent_id')
TOKEN=$(echo $REG_RESPONSE | jq -r '.token')
# Write REAL config
cat > /etc/redflag/config.json <<EOF
{
"agent_id": "$AGENT_ID",
"token": "$TOKEN",
...
}
EOF
`, serverURL, registrationToken)
Fix 2: Update Field Middleware Checks
// In middleware/auth.go
// Change from checking agent.AgentVersion to agent.CurrentVersion
if utils.IsNewerVersion(cfg.MinAgentVersion, agent.CurrentVersion) {
// Reject
}
Fix 3: Allow Legacy Agents Through
// In middleware/auth.go
if agent.MachineID == nil || *agent.MachineID == "" {
// Legacy agent - skip version check or log warning
log.Printf("Legacy agent detected: %s", agent.ID)
return // Allow through
}
What Was Definitely Broken by Recent Changes:
- Scanner timeout configuration API - Made breaking changes to DB schema without migration path
- Token renewal - Added version tracking but middleware checks version BEFORE renewal
- Install script - Never updated to use registration tokens, just writes templates
- Machine binding - Added security feature that breaks legacy agents without migration path
Working Theories:
Theory A: The Installer Never Actually Registered
The install script was copied from a version where agents self-registered on first start. When registration tokens were added, the script wasn't updated to perform registration.
Evidence:
- Script generates config with all placeholders
- No API call to
/api/v1/agents/registerin generated script - Service immediately exits with "not registered"
Test: Check git history of downloads.go around v0.1.15-v0.1.18
Theory B: Middleware Order Changed
Machine binding middleware was added or moved before authentication, causing version check to happen before token renewal can update the version.
Evidence:
- Token renewal works (version gets updated in DB)
- But agent still gets 426 after renewal
- Version check happens before handler updates it
Test: Check git history of middleware registration order in main.go
Theory C: Version Field Confusion
AgentVersion (registration) vs CurrentVersion (runtime) are being used inconsistently.
Evidence:
UpdateAgentVersion()updatescurrent_version- But middleware might check
agent_version - After upgrade,
agent_versionstill shows old version
Test: Query DB: SELECT agent_version, current_version FROM agents;
Database State to Check:
-- Check version fields
SELECT id, hostname, agent_version, current_version, machine_id
FROM agents
WHERE agent_id = 'your-agent-id';
-- Should see:
-- agent_version = "0.1.17" (set at registration)
-- current_version = "0.1.23.6" (should be updated by token renewal)
-- machine_id = NULL (legacy agent)
If current_version is NULL or not updated, token renewal isn't working.
If middleware checks agent_version, that's the bug.
Next Steps:
- Verify which field middleware checks - Look at actual middleware code
- Check git history - Find when installer last actually registered agents
- Test token renewal - Add debug logging to confirm it updates DB
- Fix installer - Make it actually call registration API
- Fix middleware - Move version check to after version update opportunity
Priority: Installer bug is blocking ALL new installs. Version tracking bug blocks upgrades. Both are release-blockers.
This document was created to preserve the diagnostic state after discovering multiple, interconnected bugs in the agent registration and version tracking system.