Files
Redflag/docs/4_LOG/December_2025/2025-11-13_Agent-Install-Registration-Flow-Investigation.md

363 lines
11 KiB
Markdown

# FUCKED.md - Agent Install/Registration Flow Analysis
**Status:** Complete breakdown of agent registration and version tracking bugs as of 2025-11-13
---
## The Complete Failure Chain
### Issue 1: Version Tracking Not Updated During Token Renewal (Server-Side)
**Root Cause:** The `MachineBindingMiddleware` checks agent version **before** token renewal can update it.
**File:** `aggregator-server/internal/api/handlers/agents.go:167`
**Flow:**
```
POST /api/v1/agents/:id/commands
[MachineBindingMiddleware] ← Checks version here (line ~75)
- Loads agent from DB
- Sees current_version = "0.1.17"
- REJECTS with 426 ← Request never reaches handler
[AgentHandler.GetCommands] ← Would update version here (line 239)
- But never gets called
```
**Fix Attempted:**
- Added `AgentVersion` field to `TokenRenewalRequest`
- Modified `RenewToken` to call `UpdateAgentVersion()`
- **Problem:** Token renewal happens AFTER 426 rejection
**Why It Failed:**
The agent gets 426 → Must get commands to send version → Can't get commands because 426 → Deadlock
---
### Issue 2: Install Script Does NOT Actually Register Agents
**Root Cause:** The install script creates a blank config template instead of calling the registration API.
**Files Affected:**
- `aggregator-server/internal/api/handlers/downloads.go:343`
- `aggregator-server/internal/services/install_templates.go` (likely)
**Current Broken Flow:**
```javascript
// In downloads.go line 343
configTemplate := map[string]interface{}{
"agent_id": "00000000-0000-0000-0000-000000000000", // PLACEHOLDER!
"token": "", // EMPTY
"refresh_token": "", // EMPTY
"registration_token": "", // EMPTY
}
```
**What Should Happen:**
```
1. curl installer | bash -s -- <registration_token>
2. Download agent binary ✓
3. Call POST /api/v1/agents/register with token ✗ MISSING
4. Get credentials back ✗ MISSING
5. Write to config.json ✗ Writes template instead
6. Start service ✓
7. Service fails: "Agent not registered" ✗
```
**The install script `generateInstallScript`:**
- Receives registration token as parameter
- **Never uses it to call registration API**
- Generates config with empty placeholders
- Agent starts, finds no credentials, exits
**Historical Context:**
This was probably written when agents could self-register on first start. When registration tokens were added, the installer was never updated to actually perform the registration.
---
### Issue 3: Middleware Version Check Happens Too Early
**Root Cause:** Version check in middleware prevents handler from updating version.
**File:** `aggregator-server/internal/api/middleware/auth.go` (assumed location)
**Middleware Chain:**
```
GET /api/v1/agents/:id/commands
[MachineBindingMiddleware] ← Version check here (line ~75)
- agent = GetAgentByMachineID()
- if version < min → 426
[AuthMiddleware] ← Auth check here
[AgentHandler.GetCommands] ← Would update version here (line 239)
- UpdateAgentVersion(agentID, metrics.Version)
```
**The Paradox:**
- Need to reach handler to update version
- Can't reach handler because version is old
- Can't update version because can't reach handler
**Fix Required:**
Version must be updated during token renewal or registration, NOT during check-in.
---
### Issue 4: Agent Version Field Confusion
**Database Schema:**
```sql
CREATE TABLE agents (
agent_version VARCHAR(50), -- Version at registration (static)
current_version VARCHAR(50), -- Current running version (dynamic)
...
);
```
**Current Queries:**
- `UpdateAgentVersion()` updates `current_version`
- But middleware might check `agent_version`
- Fields have overlapping purposes
**Evidence:**
- Agent registers as 0.1.17 → `agent_version` = 0.1.17
- Agent upgrades to 0.1.23.6 → `current_version` should update to 0.1.23.6
- But if middleware checks `agent_version`, it sees 0.1.17 → 426 rejection
**Check This:**
```sql
SELECT agent_version, current_version FROM agents WHERE id = 'agent-id';
-- If agent_version != current_version, middleware is checking wrong field
```
---
### Issue 5: Token Renewal Timing Problem
**Expected Flow:**
```
Agent check-in (v0.1.23.6 binary)
401 Unauthorized (old token)
RenewToken(agentID, refreshToken, "0.1.23.6")
Server updates DB: current_version = "0.1.23.6"
Server returns new access token
Agent retries check-in with new token
MachineBindingMiddleware sees current_version = "0.1.23.6"
Accepts request!
```
**Actual Flow:**
```
Agent check-in (v0.1.23.6 binary)
426 Upgrade Required (before auth!)
Agent NEVER reaches 401 renewal path
Deadlock
```
**The Order Is Wrong:**
Middleware checks version BEFORE checking if token is expired. Should be:
1. Check if token valid (expired?)
2. If expired, allow renewal to update version
3. Then check version
---
## Git History Investigation Guide
### Find Working Version History:
```bash
# Check when download handler last worked
git log -p -- aggregator-server/internal/api/handlers/downloads.go | grep -A20 "registration" | head -50
# Check install template service
git log -p -- aggregator-server/internal/services/install_template_service.go
# Check middleware version check implementation
git log -p -- aggregator-server/internal/api/middleware/auth.go | grep -A10 "version"
# Check when TokenRenewal first added AgentVersion
git log -p -- aggregator-server/internal/models/agent.go | grep -B5 -A5 "AgentVersion"
```
### Find Old Working Installer:
```bash
# Look for commits before machine_id was added (pre-0.1.22)
git log --oneline --before="2024-11-01" | head -20
# Checkout old version to see working installer
git checkout v0.1.16
# Study: aggregator-server/internal/api/handlers/downloads.go
git checkout main
```
### Key Commits to Investigate:
- `git log --grep="install" --grep="template" --oneline`
- `git log --grep="registration" --grep="token" --oneline`
- `git log --grep="machine" --grep="binding" --oneline`
- `git log --grep="version" --grep="current" --oneline`
---
## Files Adjacent to Downloads.go That Probably Need Checking:
1. `aggregator-server/internal/services/install_template_service.go`
- Likely contains the actual template generation
- May have had registration logic removed
2. `aggregator-server/internal/api/middleware/auth.go`
- Contains MachineBindingMiddleware
- Version check logic
3. `aggregator-server/internal/api/handlers/agent_build.go`
- May have old registration endpoint implementations
4. `aggregator-server/internal/services/config_builder.go`
- May have install-time config generation logic
5. `aggregator-server/cmd/server/main.go`
- Middleware registration order
---
## Quick Fixes That Might Work:
### Fix 1: Make Install Script Actually Register
```go
// In downloads.go generateInstallScript()
// Instead of creating template with placeholders,
// call registration API from within the bash script
script += fmt.Sprintf(`
# Actually register the agent
REG_RESPONSE=$(curl -s -X POST %s/api/v1/agents/register \
-H "Authorization: Bearer %s" \
-d '{"hostname": "$(hostname)", ...}')
# Extract credentials
AGENT_ID=$(echo $REG_RESPONSE | jq -r '.agent_id')
TOKEN=$(echo $REG_RESPONSE | jq -r '.token')
# Write REAL config
cat > /etc/redflag/config.json <<EOF
{
"agent_id": "$AGENT_ID",
"token": "$TOKEN",
...
}
EOF
`, serverURL, registrationToken)
```
### Fix 2: Update Field Middleware Checks
```go
// In middleware/auth.go
// Change from checking agent.AgentVersion to agent.CurrentVersion
if utils.IsNewerVersion(cfg.MinAgentVersion, agent.CurrentVersion) {
// Reject
}
```
### Fix 3: Allow Legacy Agents Through
```go
// In middleware/auth.go
if agent.MachineID == nil || *agent.MachineID == "" {
// Legacy agent - skip version check or log warning
log.Printf("Legacy agent detected: %s", agent.ID)
return // Allow through
}
```
---
## What Was Definitely Broken by Recent Changes:
1. **Scanner timeout configuration API** - Made breaking changes to DB schema without migration path
2. **Token renewal** - Added version tracking but middleware checks version BEFORE renewal
3. **Install script** - Never updated to use registration tokens, just writes templates
4. **Machine binding** - Added security feature that breaks legacy agents without migration path
## Working Theories:
### Theory A: The Installer Never Actually Registered
The install script was copied from a version where agents self-registered on first start. When registration tokens were added, the script wasn't updated to perform registration.
**Evidence:**
- Script generates config with all placeholders
- No API call to `/api/v1/agents/register` in generated script
- Service immediately exits with "not registered"
**Test:** Check git history of `downloads.go` around v0.1.15-v0.1.18
### Theory B: Middleware Order Changed
Machine binding middleware was added or moved before authentication, causing version check to happen before token renewal can update the version.
**Evidence:**
- Token renewal works (version gets updated in DB)
- But agent still gets 426 after renewal
- Version check happens before handler updates it
**Test:** Check git history of middleware registration order in `main.go`
### Theory C: Version Field Confusion
`AgentVersion` (registration) vs `CurrentVersion` (runtime) are being used inconsistently.
**Evidence:**
- `UpdateAgentVersion()` updates `current_version`
- But middleware might check `agent_version`
- After upgrade, `agent_version` still shows old version
**Test:** Query DB: `SELECT agent_version, current_version FROM agents;`
---
## Database State to Check:
```sql
-- Check version fields
SELECT id, hostname, agent_version, current_version, machine_id
FROM agents
WHERE agent_id = 'your-agent-id';
-- Should see:
-- agent_version = "0.1.17" (set at registration)
-- current_version = "0.1.23.6" (should be updated by token renewal)
-- machine_id = NULL (legacy agent)
```
If `current_version` is NULL or not updated, token renewal isn't working.
If middleware checks `agent_version`, that's the bug.
---
## Next Steps:
1. **Verify which field middleware checks** - Look at actual middleware code
2. **Check git history** - Find when installer last actually registered agents
3. **Test token renewal** - Add debug logging to confirm it updates DB
4. **Fix installer** - Make it actually call registration API
5. **Fix middleware** - Move version check to after version update opportunity
**Priority:** Installer bug is blocking ALL new installs. Version tracking bug blocks upgrades. Both are release-blockers.
---
*This document was created to preserve the diagnostic state after discovering multiple, interconnected bugs in the agent registration and version tracking system.*