Comprehensive audit of machine ID implementations across the agent codebase. Identified 3 production call sites with 1 critical divergence. Key findings: - F-D1-1 HIGH: Registration fallback "unknown-"+hostname is unhashed, mismatches runtime SHA256 hash, causes permanent agent lockout when GetMachineID() transiently fails then recovers - F-D1-2 MEDIUM: No recovery path from machine ID mismatch - F-D1-3 LOW: example_integration.go is dead code calling machineid.ID() directly (bypasses canonical hashing) - F-D1-4 LOW: Windows redundant machineid.ID() retry - F-D1-5 LOW: client.go uses fmt.Printf for machine ID error 6 findings total. See docs/D1_MachineID_Audit.md for details. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8.4 KiB
D-1 Machine ID Duplication Audit
Date: 2026-03-29 Branch: culurien Scope: Machine ID implementation consistency across agent codebase
1. ALL MACHINE ID IMPLEMENTATIONS
1a. Canonical: system/machine_id.go
GetMachineID() (line 15):
machineid.ID()— cross-platform library (Windows: Registry MachineGuid, Linux: /etc/machine-id)- If success:
hashMachineID(id)→ SHA256 → 64 hex chars - If fail: OS-specific fallback
Linux fallback chain (line 43):
/etc/machine-id→ hash/var/lib/dbus/machine-id→ hash/sys/class/dmi/id/product_uuid→ hash/etc/hostname+ "-linux-fallback" → hash
Windows fallback (line 80):
machineid.ID()again (same as primary — redundant)generateGenericMachineID()→ hostname-os-arch → hash
Generic fallback (line 102):
hostname-goos-goarch → hashMachineID() → 64 hex chars
Output format: ALWAYS 64 hex characters (SHA256). Every path goes through hashMachineID().
1b. Client: client/client.go
Line 36: machineID, err := system.GetMachineID() — calls canonical function. Cached in Client.machineID string field. Sent as X-Machine-ID header on every authenticated request (line 108).
If GetMachineID() fails: machineID = "" (empty string). Server will reject with 403 "missing machine ID header".
Consistent with canonical: YES.
1c. Registration: cmd/agent/main.go
Line 425: machineID, err := system.GetMachineID() — calls canonical function.
Line 428: ERROR FALLBACK: machineID = "unknown-" + sysInfo.Hostname
This fallback is NOT HASHED and NOT 64 HEX CHARS. It produces a string like "unknown-my-server" (14-30 chars, alphanumeric with dashes).
Line 443: MachineID: machineID — sent in RegisterRequest.MachineID and stored in agents.machine_id column.
1d. Example: logging/example_integration.go
Line 71: machineid.ID() — calls library DIRECTLY, bypasses GetMachineID() and hashMachineID(). Returns raw, unhashed machine ID.
Not imported anywhere. File is package logging but none of its functions are called from production code. It's dead example code.
2. ALL CALL SITES
| Location | Method | Hashed? | Consistent? |
|---|---|---|---|
system/machine_id.go:17 |
machineid.ID() → hashMachineID() |
YES | Canonical |
system/machine_id.go:82 |
machineid.ID() → hashMachineID() |
YES | Canonical |
system/machine_id.go:93 |
machineid.ID() → hashMachineID() |
YES | Canonical |
system/machine_id.go:102 |
hostname+os+arch → hashMachineID() |
YES | Canonical |
client/client.go:36 |
system.GetMachineID() |
YES | Consistent |
cmd/agent/main.go:425 |
system.GetMachineID() |
YES | Consistent |
cmd/agent/main.go:428 |
"unknown-" + hostname |
NO | DIVERGENT |
logging/example_integration.go:71 |
machineid.ID() direct |
NO | DEAD CODE |
3. DIVERGENCE ANALYSIS
3a. Can the three production call sites return different values?
Normal case (GetMachineID succeeds): YES, all three return the same SHA256-hashed value. client.go and main.go both call system.GetMachineID().
Failure case (main.go line 428):
- Registration (main.go):
"unknown-my-server"— 14-30 chars, alphanumeric, NOT hashed - Runtime (client.go): If
GetMachineID()fails at client construction,machineID = ""— empty string → server rejects with 403
F-D1-1 HIGH: Registration and runtime machine IDs can diverge.
If GetMachineID() fails during registration, the agent registers with "unknown-hostname" (unhashed). On subsequent restarts, if GetMachineID() succeeds, the client sends a SHA256 hash. The server compares: "unknown-hostname" != "a7f3...64hexchars..." → 403 FORBIDDEN. The agent is permanently locked out until re-registered.
3b. Server-Side Validation
machine_binding.go:149: *agent.MachineID != reportedMachineID — simple string equality comparison. No format validation. Would accept BOTH "unknown-hostname" (if that's what was registered) AND "a7f3...64hex...". But they must match exactly.
3c. Registration vs Runtime Mismatch
If the agent registered with "unknown-hostname" (fallback) but restarts and GetMachineID() now succeeds (transient error resolved), the client sends a SHA256 hash that doesn't match the stored "unknown-hostname" → permanent lockout.
F-D1-2 MEDIUM: No recovery path from machine ID mismatch. The only fix is manual: delete the agent from the DB and re-register. There's no "update machine ID" API.
4. EXAMPLE_INTEGRATION.GO
- Imported: NO (zero results from grep)
- Package:
logging— it IS part of the package and its functions are exported - Called: NO — none of its functions are called anywhere
- Risk:
ExampleMachineIDMonitoring()callsmachineid.ID()directly (unhashed) - Candidate for deletion: YES — dead example code that bypasses the canonical path
F-D1-3 LOW: example_integration.go is dead code. Its ExampleMachineIDMonitoring() function calls machineid.ID() directly, bypassing the canonical GetMachineID(). If anyone copies this example, they'll get unhashed machine IDs.
5. WINDOWS-SPECIFIC PATH
5a. Fallback Chains
Linux: machineid.ID() → /etc/machine-id → /var/lib/dbus/machine-id → /sys/class/dmi/id/product_uuid → hostname-linux-fallback → hostname-os-arch
Windows: machineid.ID() → machineid.ID() (redundant retry) → hostname-os-arch
F-D1-4 LOW: Windows getWindowsMachineID() calls machineid.ID() again after the primary already tried it and failed. This is a no-op retry — the same function will fail again.
5b. Dual-Boot Collision
Windows MachineGuid (registry) and Linux /etc/machine-id are independent. A dual-boot system produces different machine IDs for each OS. No collision risk.
5c. Windows Reinstall
Windows MachineGuid changes on reinstall. This is NOT documented in the codebase. After reinstalling Windows, the agent will produce a different machine ID → 403 from MachineBindingMiddleware → must re-register.
6. REGISTRATION vs RUNTIME CONSISTENCY
6a. Registration Path (main.go:425-443)
system.GetMachineID() → if error: "unknown-" + hostname (UNHASHED) → stored in DB
6b. Runtime Path (client/client.go:36-41)
system.GetMachineID() → if error: "" (empty) → sent as X-Machine-ID header → server rejects with 403
6c. Are They Guaranteed Identical?
NO. Three scenarios cause divergence:
- Registration succeeds, runtime fails: Registration stores SHA256 hash. Runtime sends empty string → 403.
- Registration fails, runtime succeeds: Registration stores
"unknown-hostname". Runtime sends SHA256 hash → 403. - Both fail but produce different fallbacks: Registration uses
"unknown-hostname"(unhashed, from main.go:428). Runtime uses empty string (from client.go:40). These don't match → 403.
F-D1-1 is the root cause. The main.go fallback at line 428 produces a fundamentally different format than the canonical function.
7. ETHOS CROSS-CHECK
| Principle | Status | Finding |
|---|---|---|
| ETHOS #1 | PARTIAL | GetMachineID failure logged in main.go (line 427) and client.go (line 39). But client.go uses fmt.Printf instead of log.Printf. |
| ETHOS #4 | VIOLATION | Machine ID is NOT idempotent when the error fallback activates. Registration path and runtime path produce different values for the same failure condition. |
FINDINGS SUMMARY
| ID | Severity | Finding | Location |
|---|---|---|---|
| F-D1-1 | HIGH | Registration fallback "unknown-"+hostname is unhashed and mismatches runtime path, causing permanent agent lockout on recovery |
cmd/agent/main.go:428 |
| F-D1-2 | MEDIUM | No recovery path from machine ID mismatch — must delete and re-register agent | machine_binding.go:149 |
| F-D1-3 | LOW | example_integration.go is dead code that calls machineid.ID() directly (unhashed), bypassing canonical path |
logging/example_integration.go:71 |
| F-D1-4 | LOW | Windows getWindowsMachineID() redundantly retries machineid.ID() after primary already failed |
system/machine_id.go:82 |
| F-D1-5 | LOW | client.go:39 uses fmt.Printf instead of log.Printf for machine ID error (ETHOS #1) |
client/client.go:39 |
| F-D1-6 | INFO | Windows reinstall changes MachineGuid, causing agent lockout — not documented | system/machine_id.go |