- Fix /api/v1/info returning hardcoded v0.1.21 (U-1) - Fix semver comparison (lexicographic -> octet-based) (U-2) - Fix bulk upgrade platform hardcoded to linux-amd64 (U-3) - Fix bulk upgrade missing nonce generation (U-4) - Add error check for sc stop in Windows restart (U-7) - Add timeout + size limit to binary download (U-8) - Fix ExtractConfigVersionFromAgent last-char bug (U-10) End-to-end upgrade pipeline now fully wired. 170 tests pass (110 server + 60 agent). No regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
15 KiB
Agent Upgrade System Audit
Date: 2026-03-29 Branch: culurien Status: Audit only — no changes
1. WHAT ALREADY EXISTS
1a. POST /build/upgrade/:agentID Handler
Route: cmd/server/main.go:422
Handler: handlers/build_orchestrator.go:95-191
Status: Partially functional — config generator, not an upgrade orchestrator.
The handler generates a fresh config JSON and returns a download URL for a pre-built binary. It does NOT:
- Verify the agent exists in the DB
- Create any DB record for the upgrade event
- Queue a
CommandTypeUpdateAgentcommand - Push or deliver anything to the agent
- Implement
PreserveExisting(lines 142-146 are a TODO stub)
The response contains manual next_steps instructions telling a human to stop the service, download, and restart.
1b. services/build_orchestrator.go — BuildAndSignAgent
File: services/build_orchestrator.go:32-96
BuildAndSignAgent(version, platform, architecture):
- Locates pre-built binary at
{agentDir}/binaries/{platform}/redflag-agent[.exe] - Signs with Ed25519 via
signingService.SignFile() - Stores in DB via
packageQueries.StoreSignedPackage() - Returns
AgentUpdatePackage
Critical disconnect: This service is NOT called by the HTTP upgrade handler. The handler uses AgentBuilder.BuildAgentWithConfig (config-only). BuildAndSignAgent is orphaned from the HTTP flow.
1c. agent_update_packages Table (Migration 016)
File: migrations/016_agent_update_packages.up.sql
| Column | Type | Notes |
|---|---|---|
id |
UUID PK | gen_random_uuid() |
version |
VARCHAR(50) | NOT NULL |
platform |
VARCHAR(50) | e.g. linux-amd64 |
architecture |
VARCHAR(20) | NOT NULL |
binary_path |
VARCHAR(500) | NOT NULL |
signature |
VARCHAR(128) | Ed25519 hex |
checksum |
VARCHAR(64) | SHA-256 |
file_size |
BIGINT | NOT NULL |
created_at |
TIMESTAMP | default now |
created_by |
VARCHAR(100) | default 'system' |
is_active |
BOOLEAN | default true |
Migration 016 also adds to agents table:
is_updating BOOLEAN DEFAULT falseupdating_to_version VARCHAR(50)update_initiated_at TIMESTAMP
1d. NewAgentBuild vs UpgradeAgentBuild
| Aspect | NewAgentBuild | UpgradeAgentBuild |
|---|---|---|
| Registration token | Required | Not needed |
| consumes_seat | true | false |
| Agent ID source | Generated or from request | From URL param |
| PreserveExisting | N/A | TODO stub |
| DB interaction | None | None |
| Command queued | No | No |
Both are config generators that return download URLs. Neither triggers actual delivery.
1e. Agent-Side Upgrade Code
A full self-update pipeline EXISTS in the agent.
Handler: cmd/agent/subsystem_handlers.go:575-762 (handleUpdateAgent)
7-step pipeline:
| Step | Line | What |
|---|---|---|
| 1 | 661 | downloadUpdatePackage() — HTTP GET to temp file |
| 2 | 669 | SHA-256 checksum verification against params["checksum"] |
| 3 | 681 | Ed25519 binary signature verification via cached server public key |
| 4 | 687 | Backup current binary to <binary>.bak |
| 5 | 719 | Atomic install: write .new, chmod, os.Rename |
| 6 | 724 | restartAgentService() — systemctl restart (Linux) or sc stop/start (Windows) |
| 7 | 731 | Watchdog: polls GetAgent() every 15s for 5 min, checks version |
Rollback: Deferred block (lines 700-715) restores from .bak if updateSuccess == false.
1f. Command Type for Self-Upgrade
YES — CommandTypeUpdateAgent = "update_agent" exists.
Defined in models/command.go:103. Dispatched in cmd/agent/main.go:1064:
case "update_agent":
handleUpdateAgent(apiClient, cmd, cfg)
Full command type list:
collect_specs,install_updates,dry_run_update,confirm_dependenciesrollback_update,update_agent,enable_heartbeat,disable_heartbeat,reboot
2. AGENT SELF-REPLACEMENT MECHANISM
2a. Existing Binary Replacement Code — EXISTS
All steps exist in subsystem_handlers.go:
- Download to temp:
downloadUpdatePackage()(line 661/774) - Ed25519 verification:
verifyBinarySignature()(line 681) - Checksum verification: SHA-256 (line 669)
- Atomic replace: write
.new+os.Rename(line 878) - Service restart:
restartAgentService()(line 724/888)
2b. Linux Restart — EXISTS
restartAgentService() at line 888:
- Try
systemctl restart redflag-agent(line 892) - Fallback:
service redflag-agent restart(line 898)
The agent knows its service name as hardcoded "redflag-agent".
2c. Windows Restart — EXISTS (with gap)
Lines 901-903: sc stop RedFlagAgent then sc start RedFlagAgent as separate commands.
Gap: No error check on sc stop — result is discarded. The running .exe is replaced via os.Rename which works on Windows if the service has stopped.
2d. Acknowledgment — EXISTS
acknowledgment.Tracker package is used:
reportLogWithAck(commandID)called at upgrade start (line 651) and completion (line 751)- The tracker persists pending acks and retries with
IncrementRetry()
3. SERVER-SIDE UPGRADE ORCHESTRATION
3a. Command Types — EXISTS
Full list in models/command.go:97-107. Includes "update_agent".
3b. update_agent Command Params
The agent handler at subsystem_handlers.go:575 expects these params:
download_url— URL to download the new binarychecksum— SHA-256 hex stringsignature— Ed25519 hex signature of the binaryversion— Expected version string after upgradenonce— Replay protection nonce (uuid:timestamp format)
3c. Agent Command Handling — EXISTS
Dispatched in main.go:1064 to handleUpdateAgent(). Full pipeline as described in section 1e.
3d. Agent Version Tracking — EXISTS
agentstable hascurrent_versioncolumn- Agent reports version on every check-in via
AgentVersion: version.Versionin the heartbeat/check-in payload is_updating,updating_to_version,update_initiated_atcolumns exist for tracking in-progress upgrades
3e. Expected Agent Version — PARTIAL
config.LatestAgentVersionfield exists in Config structversion.MinAgentVersionis build-time injected- BUT: The
/api/v1/infoendpoint returns hardcoded"v0.1.21"instead of usingversion.GetCurrentVersions()— agents and the dashboard cannot reliably detect the current expected version. version.ValidateAgentVersion()uses lexicographic string comparison (bug:"0.1.9" > "0.1.22"is true in lex order).
4. VERSION COMPARISON
4a. Agent Reports Version — YES
Via version.Version (build-time injected, default "dev"). Sent on:
- Registration (line 384/443)
- Token renewal (line 506)
- System info collection (line 373)
4b. Version String Format
Production: 0.1.26.0 (four-octet semver-like). The 4th octet = config version.
Dev: "dev".
4c. Server Expected Version — PARTIAL
config.LatestAgentVersion and version.MinAgentVersion exist but are not reliably surfaced:
/api/v1/infohardcodes"v0.1.21"- No endpoint returns
latest_agent_versiondynamically
4d. /api/v1/info Response — BROKEN
system.go:111-124 — Returns hardcoded JSON:
{
"version": "v0.1.21",
"name": "RedFlag Aggregator",
"features": [...]
}
Does NOT use version.GetCurrentVersions(). Does NOT include latest_agent_version or min_agent_version.
5. ROLLBACK MECHANISM
5a. Rollback — EXISTS
Deferred rollback in subsystem_handlers.go:700-715:
- Before install: backup to
<binary>.bak - On any failure (including watchdog timeout):
restoreFromBackup()restores the.bakfile - On success:
.bakfile is removed
5b. Backup Logic — EXISTS
createBackup() copies current binary to <path>.bak before replacement.
5c. Health Check — EXISTS
Watchdog (line 919-940) polls GetAgent() every 15s for 5 min. Success = agent.CurrentVersion == expectedVersion. Failure = timeout → rollback.
6. DASHBOARD UPGRADE UI
6a. Upgrade Button — EXISTS
Multiple entry points in Agents.tsx:
- Version column "Update" badge (line 1281-1294) when
agent.update_available === true - Per-row action button (line 1338-1348)
- Bulk action bar for selected agents (line 1112-1131)
These open AgentUpdatesModal.tsx which:
- Fetches available upgrade packages
- Single agent: generates nonce → calls
POST /agents/{id}/update - Multiple agents: calls
POST /agents/bulk-update
6b. Target Version UI — PARTIAL
AgentUpdatesModal.tsx shows a package selection grid with version/platform filters. No global "set target version" control.
6c. Bulk Upgrade — EXISTS (with bugs)
Two bulk paths:
AgentUpdatesModalbulk path — no nonces generated (security gap)BulkAgentUpdateinRelayList.tsx— platform hardcoded tolinux-amd64for all agents (line 91). Mixed-OS fleets get wrong binaries.
7. COMPLETENESS MATRIX
| Component | Status | Notes |
|---|---|---|
update_agent command type |
EXISTS | models/command.go:103 |
| Agent handles upgrade command | EXISTS | subsystem_handlers.go:575-762, full 7-step pipeline |
| Safe binary replacement (Linux) | EXISTS | Atomic rename + systemctl restart |
| Safe binary replacement (Windows) | EXISTS | Atomic rename + sc stop/start (no error check on stop) |
| Ed25519 signature verification | EXISTS | verifyBinarySignature() against cached server key |
| Checksum verification | EXISTS | SHA-256 in agent handler; server serves X-Content-SHA256 header |
| Rollback on failure | EXISTS | Deferred .bak restore on any failure including watchdog timeout |
| Server triggers upgrade command | PARTIAL | POST /agents/{id}/update endpoint exists (called by UI), but the /build/upgrade endpoint is disconnected |
| Server tracks expected version | PARTIAL | DB columns exist; /api/v1/info version is hardcoded to v0.1.21 |
| Dashboard upgrade UI | EXISTS | Single + bulk upgrade via AgentUpdatesModal |
| Bulk upgrade UI | EXISTS (buggy) | Platform hardcoded to linux-amd64; no nonces in modal bulk path |
| Acknowledgment/delivery tracking | EXISTS | acknowledgment.Tracker with retry |
| Version comparison | PARTIAL | Lexicographic comparison is buggy for multi-digit versions |
8. EFFORT ESTIMATE
8a. Exists and Just Needs Wiring
-
/api/v1/infoversion fix — Replace hardcoded"v0.1.21"withversion.GetCurrentVersions(). Addlatest_agent_versionandmin_agent_versionto the response. (~10 lines) -
BuildAndSignAgentconnection — The signing/packaging service exists but isn't called by the upgrade HTTP handler. Wire it to create a signed package when an admin triggers an upgrade. (~20 lines) -
Bulk upgrade platform detection —
RelayList.tsxline 91 hardcodeslinux-amd64. Fix to use each agent's actualos_type + os_architecture. (~5 lines) -
Bulk nonce generation —
AgentUpdatesModalbulk path skips nonces. Align with single-agent path. (~15 lines)
8b. Needs Building from Scratch
-
Semver-aware version comparison — Replace lexicographic comparison in
version.ValidateAgentVersion()with proper semver parsing. (~30 lines) -
Auto-upgrade trigger — Server-side logic: when agent checks in with version <
LatestAgentVersion, automatically queue anupdate_agentcommand. Requires policy controls (opt-in/opt-out per agent, maintenance windows). (~100-200 lines) -
Staged rollout — Upgrade N% of agents first, monitor for failures, then proceed. (~200-300 lines)
8c. Minimum Viable Upgrade System (already working)
The MVP already works end-to-end:
- Admin clicks "Update" in dashboard →
POST /agents/{id}/update - Server creates
update_agentcommand with download URL, checksum, signature - Agent polls, receives command, verifies signature+checksum
- Agent downloads new binary, backs up old, atomic replace, restarts
- Watchdog confirms new version running, rollback if not
The critical gap is /api/v1/info returning stale version. Everything else functions.
8d. Full Production Upgrade System Would Add
- Auto-upgrade policy engine (version-based triggers)
- Staged rollout with configurable percentages
- Maintenance window scheduling
- Cross-platform bulk upgrade fix (the
linux-amd64hardcode) - Upgrade history dashboard (who upgraded when, rollbacks)
- Semver comparison throughout
- Download progress reporting (large binaries on slow links)
FINDINGS TABLE
| ID | Platform | Severity | Finding | Location |
|---|---|---|---|---|
| U-1 | All | HIGH | /api/v1/info returns hardcoded "v0.1.21" — agents/dashboard cannot detect current expected version |
system.go:111-124 |
| U-2 | All | HIGH | ValidateAgentVersion uses lexicographic comparison — "0.1.9" > "0.1.22" incorrectly |
version/versions.go:72 |
| U-3 | Windows | MEDIUM | Bulk upgrade platform hardcoded to linux-amd64 — Windows agents get wrong binary |
RelayList.tsx:91 |
| U-4 | All | MEDIUM | Bulk upgrade in AgentUpdatesModal skips nonce generation — weaker replay protection |
AgentUpdatesModal.tsx:93-99 |
| U-5 | All | MEDIUM | BuildAndSignAgent service is disconnected from HTTP upgrade handler |
build_orchestrator.go |
| U-6 | All | MEDIUM | POST /build/upgrade/:agentID is a config generator, not an upgrade orchestrator |
handlers/build_orchestrator.go:95-191 |
| U-7 | Windows | LOW | sc stop result not checked in restartAgentService() |
subsystem_handlers.go:901 |
| U-8 | All | LOW | downloadUpdatePackage uses plain http.Get — no timeout, no size limit |
subsystem_handlers.go:774 |
| U-9 | All | LOW | PreserveExisting is a TODO stub in upgrade handler |
handlers/build_orchestrator.go:142-146 |
| U-10 | All | INFO | ExtractConfigVersionFromAgent is fragile — last-char extraction breaks at version x.y.z10+ |
version/versions.go:59-62 |
| U-11 | All | INFO | AgentUpdate.tsx component exists but is not imported by any page |
AgentUpdate.tsx |
| U-12 | All | INFO | build_orchestrator.go services layer marked // Deprecated |
services/build_orchestrator.go |
RECOMMENDED BUILD ORDER
- Fix
/api/v1/info(U-1) — immediate, ~10 lines, unblocks version detection - Fix bulk platform hardcode (U-3) — immediate, ~5 lines, prevents wrong-platform delivery
- Fix semver comparison (U-2) — immediate, ~30 lines, prevents version logic bugs
- Fix bulk nonce generation (U-4) — quick, ~15 lines, security consistency
- Wire
BuildAndSignAgentto upgrade flow (U-5) — medium, connects existing code - Auto-upgrade trigger — larger feature, requires policy design
- Staged rollout — future enhancement