Files
Redflag/docs/4_LOG/December_2025/2025-11-13_Agent-Install-Registration-Flow-Investigation.md

11 KiB

FUCKED.md - Agent Install/Registration Flow Analysis

Status: Complete breakdown of agent registration and version tracking bugs as of 2025-11-13


The Complete Failure Chain

Issue 1: Version Tracking Not Updated During Token Renewal (Server-Side)

Root Cause: The MachineBindingMiddleware checks agent version before token renewal can update it.

File: aggregator-server/internal/api/handlers/agents.go:167

Flow:

POST /api/v1/agents/:id/commands
  ↓
[MachineBindingMiddleware] ← Checks version here (line ~75)
  - Loads agent from DB
  - Sees current_version = "0.1.17" 
  - REJECTS with 426 ← Request never reaches handler
  ↓
[AgentHandler.GetCommands] ← Would update version here (line 239)
  - But never gets called

Fix Attempted:

  • Added AgentVersion field to TokenRenewalRequest
  • Modified RenewToken to call UpdateAgentVersion()
  • Problem: Token renewal happens AFTER 426 rejection

Why It Failed: The agent gets 426 → Must get commands to send version → Can't get commands because 426 → Deadlock


Issue 2: Install Script Does NOT Actually Register Agents

Root Cause: The install script creates a blank config template instead of calling the registration API.

Files Affected:

  • aggregator-server/internal/api/handlers/downloads.go:343
  • aggregator-server/internal/services/install_templates.go (likely)

Current Broken Flow:

// In downloads.go line 343
configTemplate := map[string]interface{}{
    "agent_id":           "00000000-0000-0000-0000-000000000000",  // PLACEHOLDER!
    "token":              "",                                        // EMPTY
    "refresh_token":      "",                                        // EMPTY  
    "registration_token": "",                                        // EMPTY
}

What Should Happen:

1. curl installer | bash -s -- <registration_token>
2. Download agent binary ✓
3. Call POST /api/v1/agents/register with token ✗ MISSING
4. Get credentials back ✗ MISSING
5. Write to config.json ✗ Writes template instead
6. Start service ✓
7. Service fails: "Agent not registered" ✗

The install script generateInstallScript:

  • Receives registration token as parameter
  • Never uses it to call registration API
  • Generates config with empty placeholders
  • Agent starts, finds no credentials, exits

Historical Context: This was probably written when agents could self-register on first start. When registration tokens were added, the installer was never updated to actually perform the registration.


Issue 3: Middleware Version Check Happens Too Early

Root Cause: Version check in middleware prevents handler from updating version.

File: aggregator-server/internal/api/middleware/auth.go (assumed location)

Middleware Chain:

GET /api/v1/agents/:id/commands
  ↓
[MachineBindingMiddleware] ← Version check here (line ~75)
  - agent = GetAgentByMachineID()
  - if version < min → 426
  ↓
[AuthMiddleware] ← Auth check here
  ↓  
[AgentHandler.GetCommands] ← Would update version here (line 239)
  - UpdateAgentVersion(agentID, metrics.Version)

The Paradox:

  • Need to reach handler to update version
  • Can't reach handler because version is old
  • Can't update version because can't reach handler

Fix Required: Version must be updated during token renewal or registration, NOT during check-in.


Issue 4: Agent Version Field Confusion

Database Schema:

CREATE TABLE agents (
    agent_version VARCHAR(50),    -- Version at registration (static)
    current_version VARCHAR(50),  -- Current running version (dynamic)
    ...
);

Current Queries:

  • UpdateAgentVersion() updates current_version
  • But middleware might check agent_version
  • Fields have overlapping purposes

Evidence:

  • Agent registers as 0.1.17 → agent_version = 0.1.17
  • Agent upgrades to 0.1.23.6 → current_version should update to 0.1.23.6
  • But if middleware checks agent_version, it sees 0.1.17 → 426 rejection

Check This:

SELECT agent_version, current_version FROM agents WHERE id = 'agent-id';
-- If agent_version != current_version, middleware is checking wrong field

Issue 5: Token Renewal Timing Problem

Expected Flow:

Agent check-in (v0.1.23.6 binary)
  ↓
401 Unauthorized (old token)
  ↓
RenewToken(agentID, refreshToken, "0.1.23.6")
  ↓  
Server updates DB: current_version = "0.1.23.6"
  ↓
Server returns new access token
  ↓
Agent retries check-in with new token
  ↓
MachineBindingMiddleware sees current_version = "0.1.23.6"
  ↓
Accepts request!

Actual Flow:

Agent check-in (v0.1.23.6 binary)
  ↓
426 Upgrade Required (before auth!)
  ↓
Agent NEVER reaches 401 renewal path
  ↓
Deadlock

The Order Is Wrong: Middleware checks version BEFORE checking if token is expired. Should be:

  1. Check if token valid (expired?)
  2. If expired, allow renewal to update version
  3. Then check version

Git History Investigation Guide

Find Working Version History:

# Check when download handler last worked
git log -p -- aggregator-server/internal/api/handlers/downloads.go | grep -A20 "registration" | head -50

# Check install template service
git log -p -- aggregator-server/internal/services/install_template_service.go

# Check middleware version check implementation
git log -p -- aggregator-server/internal/api/middleware/auth.go | grep -A10 "version"

# Check when TokenRenewal first added AgentVersion
git log -p -- aggregator-server/internal/models/agent.go | grep -B5 -A5 "AgentVersion"

Find Old Working Installer:

# Look for commits before machine_id was added (pre-0.1.22)
git log --oneline --before="2024-11-01" | head -20

# Checkout old version to see working installer
git checkout v0.1.16
# Study: aggregator-server/internal/api/handlers/downloads.go
git checkout main

Key Commits to Investigate:

  • git log --grep="install" --grep="template" --oneline
  • git log --grep="registration" --grep="token" --oneline
  • git log --grep="machine" --grep="binding" --oneline
  • git log --grep="version" --grep="current" --oneline

Files Adjacent to Downloads.go That Probably Need Checking:

  1. aggregator-server/internal/services/install_template_service.go

    • Likely contains the actual template generation
    • May have had registration logic removed
  2. aggregator-server/internal/api/middleware/auth.go

    • Contains MachineBindingMiddleware
    • Version check logic
  3. aggregator-server/internal/api/handlers/agent_build.go

    • May have old registration endpoint implementations
  4. aggregator-server/internal/services/config_builder.go

    • May have install-time config generation logic
  5. aggregator-server/cmd/server/main.go

    • Middleware registration order

Quick Fixes That Might Work:

Fix 1: Make Install Script Actually Register

// In downloads.go generateInstallScript()
// Instead of creating template with placeholders,
// call registration API from within the bash script

script += fmt.Sprintf(`
# Actually register the agent
REG_RESPONSE=$(curl -s -X POST %s/api/v1/agents/register \
  -H "Authorization: Bearer %s" \
  -d '{"hostname": "$(hostname)", ...}')

# Extract credentials
AGENT_ID=$(echo $REG_RESPONSE | jq -r '.agent_id')
TOKEN=$(echo $REG_RESPONSE | jq -r '.token')

# Write REAL config
cat > /etc/redflag/config.json <<EOF
{
  "agent_id": "$AGENT_ID",
  "token": "$TOKEN",
  ...
}
EOF
`, serverURL, registrationToken)

Fix 2: Update Field Middleware Checks

// In middleware/auth.go
// Change from checking agent.AgentVersion to agent.CurrentVersion
if utils.IsNewerVersion(cfg.MinAgentVersion, agent.CurrentVersion) {
    // Reject
}

Fix 3: Allow Legacy Agents Through

// In middleware/auth.go
if agent.MachineID == nil || *agent.MachineID == "" {
    // Legacy agent - skip version check or log warning
    log.Printf("Legacy agent detected: %s", agent.ID)
    return // Allow through
}

What Was Definitely Broken by Recent Changes:

  1. Scanner timeout configuration API - Made breaking changes to DB schema without migration path
  2. Token renewal - Added version tracking but middleware checks version BEFORE renewal
  3. Install script - Never updated to use registration tokens, just writes templates
  4. Machine binding - Added security feature that breaks legacy agents without migration path

Working Theories:

Theory A: The Installer Never Actually Registered

The install script was copied from a version where agents self-registered on first start. When registration tokens were added, the script wasn't updated to perform registration.

Evidence:

  • Script generates config with all placeholders
  • No API call to /api/v1/agents/register in generated script
  • Service immediately exits with "not registered"

Test: Check git history of downloads.go around v0.1.15-v0.1.18

Theory B: Middleware Order Changed

Machine binding middleware was added or moved before authentication, causing version check to happen before token renewal can update the version.

Evidence:

  • Token renewal works (version gets updated in DB)
  • But agent still gets 426 after renewal
  • Version check happens before handler updates it

Test: Check git history of middleware registration order in main.go

Theory C: Version Field Confusion

AgentVersion (registration) vs CurrentVersion (runtime) are being used inconsistently.

Evidence:

  • UpdateAgentVersion() updates current_version
  • But middleware might check agent_version
  • After upgrade, agent_version still shows old version

Test: Query DB: SELECT agent_version, current_version FROM agents;


Database State to Check:

-- Check version fields
SELECT id, hostname, agent_version, current_version, machine_id 
FROM agents 
WHERE agent_id = 'your-agent-id';

-- Should see:
-- agent_version = "0.1.17" (set at registration)
-- current_version = "0.1.23.6" (should be updated by token renewal)
-- machine_id = NULL (legacy agent)

If current_version is NULL or not updated, token renewal isn't working. If middleware checks agent_version, that's the bug.


Next Steps:

  1. Verify which field middleware checks - Look at actual middleware code
  2. Check git history - Find when installer last actually registered agents
  3. Test token renewal - Add debug logging to confirm it updates DB
  4. Fix installer - Make it actually call registration API
  5. Fix middleware - Move version check to after version update opportunity

Priority: Installer bug is blocking ALL new installs. Version tracking bug blocks upgrades. Both are release-blockers.


This document was created to preserve the diagnostic state after discovering multiple, interconnected bugs in the agent registration and version tracking system.