363 lines
11 KiB
Markdown
363 lines
11 KiB
Markdown
# FUCKED.md - Agent Install/Registration Flow Analysis
|
|
|
|
**Status:** Complete breakdown of agent registration and version tracking bugs as of 2025-11-13
|
|
|
|
---
|
|
|
|
## The Complete Failure Chain
|
|
|
|
### Issue 1: Version Tracking Not Updated During Token Renewal (Server-Side)
|
|
|
|
**Root Cause:** The `MachineBindingMiddleware` checks agent version **before** token renewal can update it.
|
|
|
|
**File:** `aggregator-server/internal/api/handlers/agents.go:167`
|
|
|
|
**Flow:**
|
|
```
|
|
POST /api/v1/agents/:id/commands
|
|
↓
|
|
[MachineBindingMiddleware] ← Checks version here (line ~75)
|
|
- Loads agent from DB
|
|
- Sees current_version = "0.1.17"
|
|
- REJECTS with 426 ← Request never reaches handler
|
|
↓
|
|
[AgentHandler.GetCommands] ← Would update version here (line 239)
|
|
- But never gets called
|
|
```
|
|
|
|
**Fix Attempted:**
|
|
- Added `AgentVersion` field to `TokenRenewalRequest`
|
|
- Modified `RenewToken` to call `UpdateAgentVersion()`
|
|
- **Problem:** Token renewal happens AFTER 426 rejection
|
|
|
|
**Why It Failed:**
|
|
The agent gets 426 → Must get commands to send version → Can't get commands because 426 → Deadlock
|
|
|
|
---
|
|
|
|
### Issue 2: Install Script Does NOT Actually Register Agents
|
|
|
|
**Root Cause:** The install script creates a blank config template instead of calling the registration API.
|
|
|
|
**Files Affected:**
|
|
- `aggregator-server/internal/api/handlers/downloads.go:343`
|
|
- `aggregator-server/internal/services/install_templates.go` (likely)
|
|
|
|
**Current Broken Flow:**
|
|
```javascript
|
|
// In downloads.go line 343
|
|
configTemplate := map[string]interface{}{
|
|
"agent_id": "00000000-0000-0000-0000-000000000000", // PLACEHOLDER!
|
|
"token": "", // EMPTY
|
|
"refresh_token": "", // EMPTY
|
|
"registration_token": "", // EMPTY
|
|
}
|
|
```
|
|
|
|
**What Should Happen:**
|
|
```
|
|
1. curl installer | bash -s -- <registration_token>
|
|
2. Download agent binary ✓
|
|
3. Call POST /api/v1/agents/register with token ✗ MISSING
|
|
4. Get credentials back ✗ MISSING
|
|
5. Write to config.json ✗ Writes template instead
|
|
6. Start service ✓
|
|
7. Service fails: "Agent not registered" ✗
|
|
```
|
|
|
|
**The install script `generateInstallScript`:**
|
|
- Receives registration token as parameter
|
|
- **Never uses it to call registration API**
|
|
- Generates config with empty placeholders
|
|
- Agent starts, finds no credentials, exits
|
|
|
|
**Historical Context:**
|
|
This was probably written when agents could self-register on first start. When registration tokens were added, the installer was never updated to actually perform the registration.
|
|
|
|
---
|
|
|
|
### Issue 3: Middleware Version Check Happens Too Early
|
|
|
|
**Root Cause:** Version check in middleware prevents handler from updating version.
|
|
|
|
**File:** `aggregator-server/internal/api/middleware/auth.go` (assumed location)
|
|
|
|
**Middleware Chain:**
|
|
```
|
|
GET /api/v1/agents/:id/commands
|
|
↓
|
|
[MachineBindingMiddleware] ← Version check here (line ~75)
|
|
- agent = GetAgentByMachineID()
|
|
- if version < min → 426
|
|
↓
|
|
[AuthMiddleware] ← Auth check here
|
|
↓
|
|
[AgentHandler.GetCommands] ← Would update version here (line 239)
|
|
- UpdateAgentVersion(agentID, metrics.Version)
|
|
```
|
|
|
|
**The Paradox:**
|
|
- Need to reach handler to update version
|
|
- Can't reach handler because version is old
|
|
- Can't update version because can't reach handler
|
|
|
|
**Fix Required:**
|
|
Version must be updated during token renewal or registration, NOT during check-in.
|
|
|
|
---
|
|
|
|
### Issue 4: Agent Version Field Confusion
|
|
|
|
**Database Schema:**
|
|
```sql
|
|
CREATE TABLE agents (
|
|
agent_version VARCHAR(50), -- Version at registration (static)
|
|
current_version VARCHAR(50), -- Current running version (dynamic)
|
|
...
|
|
);
|
|
```
|
|
|
|
**Current Queries:**
|
|
- `UpdateAgentVersion()` updates `current_version` ✓
|
|
- But middleware might check `agent_version` ✗
|
|
- Fields have overlapping purposes
|
|
|
|
**Evidence:**
|
|
- Agent registers as 0.1.17 → `agent_version` = 0.1.17
|
|
- Agent upgrades to 0.1.23.6 → `current_version` should update to 0.1.23.6
|
|
- But if middleware checks `agent_version`, it sees 0.1.17 → 426 rejection
|
|
|
|
**Check This:**
|
|
```sql
|
|
SELECT agent_version, current_version FROM agents WHERE id = 'agent-id';
|
|
-- If agent_version != current_version, middleware is checking wrong field
|
|
```
|
|
|
|
---
|
|
|
|
### Issue 5: Token Renewal Timing Problem
|
|
|
|
**Expected Flow:**
|
|
```
|
|
Agent check-in (v0.1.23.6 binary)
|
|
↓
|
|
401 Unauthorized (old token)
|
|
↓
|
|
RenewToken(agentID, refreshToken, "0.1.23.6")
|
|
↓
|
|
Server updates DB: current_version = "0.1.23.6"
|
|
↓
|
|
Server returns new access token
|
|
↓
|
|
Agent retries check-in with new token
|
|
↓
|
|
MachineBindingMiddleware sees current_version = "0.1.23.6"
|
|
↓
|
|
Accepts request!
|
|
```
|
|
|
|
**Actual Flow:**
|
|
```
|
|
Agent check-in (v0.1.23.6 binary)
|
|
↓
|
|
426 Upgrade Required (before auth!)
|
|
↓
|
|
Agent NEVER reaches 401 renewal path
|
|
↓
|
|
Deadlock
|
|
```
|
|
|
|
**The Order Is Wrong:**
|
|
Middleware checks version BEFORE checking if token is expired. Should be:
|
|
1. Check if token valid (expired?)
|
|
2. If expired, allow renewal to update version
|
|
3. Then check version
|
|
|
|
---
|
|
|
|
## Git History Investigation Guide
|
|
|
|
### Find Working Version History:
|
|
|
|
```bash
|
|
# Check when download handler last worked
|
|
git log -p -- aggregator-server/internal/api/handlers/downloads.go | grep -A20 "registration" | head -50
|
|
|
|
# Check install template service
|
|
git log -p -- aggregator-server/internal/services/install_template_service.go
|
|
|
|
# Check middleware version check implementation
|
|
git log -p -- aggregator-server/internal/api/middleware/auth.go | grep -A10 "version"
|
|
|
|
# Check when TokenRenewal first added AgentVersion
|
|
git log -p -- aggregator-server/internal/models/agent.go | grep -B5 -A5 "AgentVersion"
|
|
```
|
|
|
|
### Find Old Working Installer:
|
|
|
|
```bash
|
|
# Look for commits before machine_id was added (pre-0.1.22)
|
|
git log --oneline --before="2024-11-01" | head -20
|
|
|
|
# Checkout old version to see working installer
|
|
git checkout v0.1.16
|
|
# Study: aggregator-server/internal/api/handlers/downloads.go
|
|
git checkout main
|
|
```
|
|
|
|
### Key Commits to Investigate:
|
|
|
|
- `git log --grep="install" --grep="template" --oneline`
|
|
- `git log --grep="registration" --grep="token" --oneline`
|
|
- `git log --grep="machine" --grep="binding" --oneline`
|
|
- `git log --grep="version" --grep="current" --oneline`
|
|
|
|
---
|
|
|
|
## Files Adjacent to Downloads.go That Probably Need Checking:
|
|
|
|
1. `aggregator-server/internal/services/install_template_service.go`
|
|
- Likely contains the actual template generation
|
|
- May have had registration logic removed
|
|
|
|
2. `aggregator-server/internal/api/middleware/auth.go`
|
|
- Contains MachineBindingMiddleware
|
|
- Version check logic
|
|
|
|
3. `aggregator-server/internal/api/handlers/agent_build.go`
|
|
- May have old registration endpoint implementations
|
|
|
|
4. `aggregator-server/internal/services/config_builder.go`
|
|
- May have install-time config generation logic
|
|
|
|
5. `aggregator-server/cmd/server/main.go`
|
|
- Middleware registration order
|
|
|
|
---
|
|
|
|
## Quick Fixes That Might Work:
|
|
|
|
### Fix 1: Make Install Script Actually Register
|
|
|
|
```go
|
|
// In downloads.go generateInstallScript()
|
|
// Instead of creating template with placeholders,
|
|
// call registration API from within the bash script
|
|
|
|
script += fmt.Sprintf(`
|
|
# Actually register the agent
|
|
REG_RESPONSE=$(curl -s -X POST %s/api/v1/agents/register \
|
|
-H "Authorization: Bearer %s" \
|
|
-d '{"hostname": "$(hostname)", ...}')
|
|
|
|
# Extract credentials
|
|
AGENT_ID=$(echo $REG_RESPONSE | jq -r '.agent_id')
|
|
TOKEN=$(echo $REG_RESPONSE | jq -r '.token')
|
|
|
|
# Write REAL config
|
|
cat > /etc/redflag/config.json <<EOF
|
|
{
|
|
"agent_id": "$AGENT_ID",
|
|
"token": "$TOKEN",
|
|
...
|
|
}
|
|
EOF
|
|
`, serverURL, registrationToken)
|
|
```
|
|
|
|
### Fix 2: Update Field Middleware Checks
|
|
|
|
```go
|
|
// In middleware/auth.go
|
|
// Change from checking agent.AgentVersion to agent.CurrentVersion
|
|
if utils.IsNewerVersion(cfg.MinAgentVersion, agent.CurrentVersion) {
|
|
// Reject
|
|
}
|
|
```
|
|
|
|
### Fix 3: Allow Legacy Agents Through
|
|
|
|
```go
|
|
// In middleware/auth.go
|
|
if agent.MachineID == nil || *agent.MachineID == "" {
|
|
// Legacy agent - skip version check or log warning
|
|
log.Printf("Legacy agent detected: %s", agent.ID)
|
|
return // Allow through
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## What Was Definitely Broken by Recent Changes:
|
|
|
|
1. **Scanner timeout configuration API** - Made breaking changes to DB schema without migration path
|
|
2. **Token renewal** - Added version tracking but middleware checks version BEFORE renewal
|
|
3. **Install script** - Never updated to use registration tokens, just writes templates
|
|
4. **Machine binding** - Added security feature that breaks legacy agents without migration path
|
|
|
|
## Working Theories:
|
|
|
|
### Theory A: The Installer Never Actually Registered
|
|
The install script was copied from a version where agents self-registered on first start. When registration tokens were added, the script wasn't updated to perform registration.
|
|
|
|
**Evidence:**
|
|
- Script generates config with all placeholders
|
|
- No API call to `/api/v1/agents/register` in generated script
|
|
- Service immediately exits with "not registered"
|
|
|
|
**Test:** Check git history of `downloads.go` around v0.1.15-v0.1.18
|
|
|
|
### Theory B: Middleware Order Changed
|
|
Machine binding middleware was added or moved before authentication, causing version check to happen before token renewal can update the version.
|
|
|
|
**Evidence:**
|
|
- Token renewal works (version gets updated in DB)
|
|
- But agent still gets 426 after renewal
|
|
- Version check happens before handler updates it
|
|
|
|
**Test:** Check git history of middleware registration order in `main.go`
|
|
|
|
### Theory C: Version Field Confusion
|
|
`AgentVersion` (registration) vs `CurrentVersion` (runtime) are being used inconsistently.
|
|
|
|
**Evidence:**
|
|
- `UpdateAgentVersion()` updates `current_version`
|
|
- But middleware might check `agent_version`
|
|
- After upgrade, `agent_version` still shows old version
|
|
|
|
**Test:** Query DB: `SELECT agent_version, current_version FROM agents;`
|
|
|
|
---
|
|
|
|
## Database State to Check:
|
|
|
|
```sql
|
|
-- Check version fields
|
|
SELECT id, hostname, agent_version, current_version, machine_id
|
|
FROM agents
|
|
WHERE agent_id = 'your-agent-id';
|
|
|
|
-- Should see:
|
|
-- agent_version = "0.1.17" (set at registration)
|
|
-- current_version = "0.1.23.6" (should be updated by token renewal)
|
|
-- machine_id = NULL (legacy agent)
|
|
```
|
|
|
|
If `current_version` is NULL or not updated, token renewal isn't working.
|
|
If middleware checks `agent_version`, that's the bug.
|
|
|
|
---
|
|
|
|
## Next Steps:
|
|
|
|
1. **Verify which field middleware checks** - Look at actual middleware code
|
|
2. **Check git history** - Find when installer last actually registered agents
|
|
3. **Test token renewal** - Add debug logging to confirm it updates DB
|
|
4. **Fix installer** - Make it actually call registration API
|
|
5. **Fix middleware** - Move version check to after version update opportunity
|
|
|
|
**Priority:** Installer bug is blocking ALL new installs. Version tracking bug blocks upgrades. Both are release-blockers.
|
|
|
|
---
|
|
|
|
*This document was created to preserve the diagnostic state after discovering multiple, interconnected bugs in the agent registration and version tracking system.* |