docs: D-1 machine ID duplication audit

Comprehensive audit of machine ID implementations across the
agent codebase. Identified 3 production call sites with 1 critical
divergence.

Key findings:
- F-D1-1 HIGH: Registration fallback "unknown-"+hostname is unhashed,
  mismatches runtime SHA256 hash, causes permanent agent lockout
  when GetMachineID() transiently fails then recovers
- F-D1-2 MEDIUM: No recovery path from machine ID mismatch
- F-D1-3 LOW: example_integration.go is dead code calling
  machineid.ID() directly (bypasses canonical hashing)
- F-D1-4 LOW: Windows redundant machineid.ID() retry
- F-D1-5 LOW: client.go uses fmt.Printf for machine ID error

6 findings total. See docs/D1_MachineID_Audit.md for details.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-03-29 09:34:04 -04:00
parent a1df7d7b05
commit 8530e6c6fc

170
docs/D1_MachineID_Audit.md Normal file
View File

@@ -0,0 +1,170 @@
# D-1 Machine ID Duplication Audit
**Date:** 2026-03-29
**Branch:** culurien
**Scope:** Machine ID implementation consistency across agent codebase
---
## 1. ALL MACHINE ID IMPLEMENTATIONS
### 1a. Canonical: `system/machine_id.go`
**GetMachineID()** (line 15):
1. `machineid.ID()` — cross-platform library (Windows: Registry MachineGuid, Linux: /etc/machine-id)
2. If success: `hashMachineID(id)` → SHA256 → 64 hex chars
3. If fail: OS-specific fallback
**Linux fallback chain** (line 43):
1. `/etc/machine-id` → hash
2. `/var/lib/dbus/machine-id` → hash
3. `/sys/class/dmi/id/product_uuid` → hash
4. `/etc/hostname` + "-linux-fallback" → hash
**Windows fallback** (line 80):
1. `machineid.ID()` again (same as primary — redundant)
2. `generateGenericMachineID()` → hostname-os-arch → hash
**Generic fallback** (line 102):
`hostname-goos-goarch``hashMachineID()` → 64 hex chars
**Output format:** ALWAYS 64 hex characters (SHA256). Every path goes through `hashMachineID()`.
### 1b. Client: `client/client.go`
Line 36: `machineID, err := system.GetMachineID()` — calls canonical function. Cached in `Client.machineID` string field. Sent as `X-Machine-ID` header on every authenticated request (line 108).
If `GetMachineID()` fails: `machineID = ""` (empty string). Server will reject with 403 "missing machine ID header".
**Consistent with canonical:** YES.
### 1c. Registration: `cmd/agent/main.go`
Line 425: `machineID, err := system.GetMachineID()` — calls canonical function.
Line 428: **ERROR FALLBACK: `machineID = "unknown-" + sysInfo.Hostname`**
This fallback is **NOT HASHED** and **NOT 64 HEX CHARS**. It produces a string like `"unknown-my-server"` (14-30 chars, alphanumeric with dashes).
Line 443: `MachineID: machineID` — sent in `RegisterRequest.MachineID` and stored in `agents.machine_id` column.
### 1d. Example: `logging/example_integration.go`
Line 71: `machineid.ID()` — calls library DIRECTLY, bypasses `GetMachineID()` and `hashMachineID()`. Returns raw, unhashed machine ID.
**Not imported anywhere.** File is `package logging` but none of its functions are called from production code. It's dead example code.
---
## 2. ALL CALL SITES
| Location | Method | Hashed? | Consistent? |
|----------|--------|---------|-------------|
| `system/machine_id.go:17` | `machineid.ID()``hashMachineID()` | YES | Canonical |
| `system/machine_id.go:82` | `machineid.ID()``hashMachineID()` | YES | Canonical |
| `system/machine_id.go:93` | `machineid.ID()``hashMachineID()` | YES | Canonical |
| `system/machine_id.go:102` | hostname+os+arch → `hashMachineID()` | YES | Canonical |
| `client/client.go:36` | `system.GetMachineID()` | YES | Consistent |
| `cmd/agent/main.go:425` | `system.GetMachineID()` | YES | Consistent |
| `cmd/agent/main.go:428` | `"unknown-" + hostname` | **NO** | **DIVERGENT** |
| `logging/example_integration.go:71` | `machineid.ID()` direct | **NO** | **DEAD CODE** |
---
## 3. DIVERGENCE ANALYSIS
### 3a. Can the three production call sites return different values?
**Normal case (GetMachineID succeeds):** YES, all three return the same SHA256-hashed value. `client.go` and `main.go` both call `system.GetMachineID()`.
**Failure case (main.go line 428):**
- **Registration (main.go):** `"unknown-my-server"` — 14-30 chars, alphanumeric, NOT hashed
- **Runtime (client.go):** If `GetMachineID()` fails at client construction, `machineID = ""` — empty string → server rejects with 403
**F-D1-1 HIGH: Registration and runtime machine IDs can diverge.**
If `GetMachineID()` fails during registration, the agent registers with `"unknown-hostname"` (unhashed). On subsequent restarts, if `GetMachineID()` succeeds, the client sends a SHA256 hash. The server compares: `"unknown-hostname" != "a7f3...64hexchars..."`**403 FORBIDDEN**. The agent is permanently locked out until re-registered.
### 3b. Server-Side Validation
`machine_binding.go:149`: `*agent.MachineID != reportedMachineID` — simple string equality comparison. No format validation. Would accept BOTH `"unknown-hostname"` (if that's what was registered) AND `"a7f3...64hex..."`. But they must match exactly.
### 3c. Registration vs Runtime Mismatch
If the agent registered with `"unknown-hostname"` (fallback) but restarts and `GetMachineID()` now succeeds (transient error resolved), the client sends a SHA256 hash that doesn't match the stored `"unknown-hostname"`**permanent lockout**.
**F-D1-2 MEDIUM: No recovery path from machine ID mismatch.** The only fix is manual: delete the agent from the DB and re-register. There's no "update machine ID" API.
---
## 4. EXAMPLE_INTEGRATION.GO
- **Imported:** NO (zero results from grep)
- **Package:** `logging` — it IS part of the package and its functions are exported
- **Called:** NO — none of its functions are called anywhere
- **Risk:** `ExampleMachineIDMonitoring()` calls `machineid.ID()` directly (unhashed)
- **Candidate for deletion:** YES — dead example code that bypasses the canonical path
**F-D1-3 LOW:** `example_integration.go` is dead code. Its `ExampleMachineIDMonitoring()` function calls `machineid.ID()` directly, bypassing the canonical `GetMachineID()`. If anyone copies this example, they'll get unhashed machine IDs.
---
## 5. WINDOWS-SPECIFIC PATH
### 5a. Fallback Chains
**Linux:** machineid.ID() → /etc/machine-id → /var/lib/dbus/machine-id → /sys/class/dmi/id/product_uuid → hostname-linux-fallback → hostname-os-arch
**Windows:** machineid.ID() → machineid.ID() (redundant retry) → hostname-os-arch
**F-D1-4 LOW:** Windows `getWindowsMachineID()` calls `machineid.ID()` again after the primary already tried it and failed. This is a no-op retry — the same function will fail again.
### 5b. Dual-Boot Collision
Windows MachineGuid (registry) and Linux /etc/machine-id are independent. A dual-boot system produces different machine IDs for each OS. No collision risk.
### 5c. Windows Reinstall
Windows MachineGuid changes on reinstall. This is NOT documented in the codebase. After reinstalling Windows, the agent will produce a different machine ID → 403 from MachineBindingMiddleware → must re-register.
---
## 6. REGISTRATION vs RUNTIME CONSISTENCY
### 6a. Registration Path (main.go:425-443)
`system.GetMachineID()` → if error: `"unknown-" + hostname` (UNHASHED) → stored in DB
### 6b. Runtime Path (client/client.go:36-41)
`system.GetMachineID()` → if error: `""` (empty) → sent as X-Machine-ID header → server rejects with 403
### 6c. Are They Guaranteed Identical?
**NO.** Three scenarios cause divergence:
1. **Registration succeeds, runtime fails:** Registration stores SHA256 hash. Runtime sends empty string → 403.
2. **Registration fails, runtime succeeds:** Registration stores `"unknown-hostname"`. Runtime sends SHA256 hash → 403.
3. **Both fail but produce different fallbacks:** Registration uses `"unknown-hostname"` (unhashed, from main.go:428). Runtime uses empty string (from client.go:40). These don't match → 403.
**F-D1-1 is the root cause.** The main.go fallback at line 428 produces a fundamentally different format than the canonical function.
---
## 7. ETHOS CROSS-CHECK
| Principle | Status | Finding |
|-----------|--------|---------|
| ETHOS #1 | PARTIAL | GetMachineID failure logged in main.go (line 427) and client.go (line 39). But client.go uses `fmt.Printf` instead of `log.Printf`. |
| ETHOS #4 | VIOLATION | Machine ID is NOT idempotent when the error fallback activates. Registration path and runtime path produce different values for the same failure condition. |
---
## FINDINGS SUMMARY
| ID | Severity | Finding | Location |
|----|----------|---------|----------|
| F-D1-1 | HIGH | Registration fallback `"unknown-"+hostname` is unhashed and mismatches runtime path, causing permanent agent lockout on recovery | cmd/agent/main.go:428 |
| F-D1-2 | MEDIUM | No recovery path from machine ID mismatch — must delete and re-register agent | machine_binding.go:149 |
| F-D1-3 | LOW | `example_integration.go` is dead code that calls `machineid.ID()` directly (unhashed), bypassing canonical path | logging/example_integration.go:71 |
| F-D1-4 | LOW | Windows `getWindowsMachineID()` redundantly retries `machineid.ID()` after primary already failed | system/machine_id.go:82 |
| F-D1-5 | LOW | `client.go:39` uses `fmt.Printf` instead of `log.Printf` for machine ID error (ETHOS #1) | client/client.go:39 |
| F-D1-6 | INFO | Windows reinstall changes MachineGuid, causing agent lockout — not documented | system/machine_id.go |