Comprehensive audit of machine ID implementations across the agent codebase. Identified 3 production call sites with 1 critical divergence. Key findings: - F-D1-1 HIGH: Registration fallback "unknown-"+hostname is unhashed, mismatches runtime SHA256 hash, causes permanent agent lockout when GetMachineID() transiently fails then recovers - F-D1-2 MEDIUM: No recovery path from machine ID mismatch - F-D1-3 LOW: example_integration.go is dead code calling machineid.ID() directly (bypasses canonical hashing) - F-D1-4 LOW: Windows redundant machineid.ID() retry - F-D1-5 LOW: client.go uses fmt.Printf for machine ID error 6 findings total. See docs/D1_MachineID_Audit.md for details. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
171 lines
8.4 KiB
Markdown
171 lines
8.4 KiB
Markdown
# D-1 Machine ID Duplication Audit
|
|
|
|
**Date:** 2026-03-29
|
|
**Branch:** culurien
|
|
**Scope:** Machine ID implementation consistency across agent codebase
|
|
|
|
---
|
|
|
|
## 1. ALL MACHINE ID IMPLEMENTATIONS
|
|
|
|
### 1a. Canonical: `system/machine_id.go`
|
|
|
|
**GetMachineID()** (line 15):
|
|
1. `machineid.ID()` — cross-platform library (Windows: Registry MachineGuid, Linux: /etc/machine-id)
|
|
2. If success: `hashMachineID(id)` → SHA256 → 64 hex chars
|
|
3. If fail: OS-specific fallback
|
|
|
|
**Linux fallback chain** (line 43):
|
|
1. `/etc/machine-id` → hash
|
|
2. `/var/lib/dbus/machine-id` → hash
|
|
3. `/sys/class/dmi/id/product_uuid` → hash
|
|
4. `/etc/hostname` + "-linux-fallback" → hash
|
|
|
|
**Windows fallback** (line 80):
|
|
1. `machineid.ID()` again (same as primary — redundant)
|
|
2. `generateGenericMachineID()` → hostname-os-arch → hash
|
|
|
|
**Generic fallback** (line 102):
|
|
`hostname-goos-goarch` → `hashMachineID()` → 64 hex chars
|
|
|
|
**Output format:** ALWAYS 64 hex characters (SHA256). Every path goes through `hashMachineID()`.
|
|
|
|
### 1b. Client: `client/client.go`
|
|
|
|
Line 36: `machineID, err := system.GetMachineID()` — calls canonical function. Cached in `Client.machineID` string field. Sent as `X-Machine-ID` header on every authenticated request (line 108).
|
|
|
|
If `GetMachineID()` fails: `machineID = ""` (empty string). Server will reject with 403 "missing machine ID header".
|
|
|
|
**Consistent with canonical:** YES.
|
|
|
|
### 1c. Registration: `cmd/agent/main.go`
|
|
|
|
Line 425: `machineID, err := system.GetMachineID()` — calls canonical function.
|
|
Line 428: **ERROR FALLBACK: `machineID = "unknown-" + sysInfo.Hostname`**
|
|
|
|
This fallback is **NOT HASHED** and **NOT 64 HEX CHARS**. It produces a string like `"unknown-my-server"` (14-30 chars, alphanumeric with dashes).
|
|
|
|
Line 443: `MachineID: machineID` — sent in `RegisterRequest.MachineID` and stored in `agents.machine_id` column.
|
|
|
|
### 1d. Example: `logging/example_integration.go`
|
|
|
|
Line 71: `machineid.ID()` — calls library DIRECTLY, bypasses `GetMachineID()` and `hashMachineID()`. Returns raw, unhashed machine ID.
|
|
|
|
**Not imported anywhere.** File is `package logging` but none of its functions are called from production code. It's dead example code.
|
|
|
|
---
|
|
|
|
## 2. ALL CALL SITES
|
|
|
|
| Location | Method | Hashed? | Consistent? |
|
|
|----------|--------|---------|-------------|
|
|
| `system/machine_id.go:17` | `machineid.ID()` → `hashMachineID()` | YES | Canonical |
|
|
| `system/machine_id.go:82` | `machineid.ID()` → `hashMachineID()` | YES | Canonical |
|
|
| `system/machine_id.go:93` | `machineid.ID()` → `hashMachineID()` | YES | Canonical |
|
|
| `system/machine_id.go:102` | hostname+os+arch → `hashMachineID()` | YES | Canonical |
|
|
| `client/client.go:36` | `system.GetMachineID()` | YES | Consistent |
|
|
| `cmd/agent/main.go:425` | `system.GetMachineID()` | YES | Consistent |
|
|
| `cmd/agent/main.go:428` | `"unknown-" + hostname` | **NO** | **DIVERGENT** |
|
|
| `logging/example_integration.go:71` | `machineid.ID()` direct | **NO** | **DEAD CODE** |
|
|
|
|
---
|
|
|
|
## 3. DIVERGENCE ANALYSIS
|
|
|
|
### 3a. Can the three production call sites return different values?
|
|
|
|
**Normal case (GetMachineID succeeds):** YES, all three return the same SHA256-hashed value. `client.go` and `main.go` both call `system.GetMachineID()`.
|
|
|
|
**Failure case (main.go line 428):**
|
|
- **Registration (main.go):** `"unknown-my-server"` — 14-30 chars, alphanumeric, NOT hashed
|
|
- **Runtime (client.go):** If `GetMachineID()` fails at client construction, `machineID = ""` — empty string → server rejects with 403
|
|
|
|
**F-D1-1 HIGH: Registration and runtime machine IDs can diverge.**
|
|
If `GetMachineID()` fails during registration, the agent registers with `"unknown-hostname"` (unhashed). On subsequent restarts, if `GetMachineID()` succeeds, the client sends a SHA256 hash. The server compares: `"unknown-hostname" != "a7f3...64hexchars..."` → **403 FORBIDDEN**. The agent is permanently locked out until re-registered.
|
|
|
|
### 3b. Server-Side Validation
|
|
|
|
`machine_binding.go:149`: `*agent.MachineID != reportedMachineID` — simple string equality comparison. No format validation. Would accept BOTH `"unknown-hostname"` (if that's what was registered) AND `"a7f3...64hex..."`. But they must match exactly.
|
|
|
|
### 3c. Registration vs Runtime Mismatch
|
|
|
|
If the agent registered with `"unknown-hostname"` (fallback) but restarts and `GetMachineID()` now succeeds (transient error resolved), the client sends a SHA256 hash that doesn't match the stored `"unknown-hostname"` → **permanent lockout**.
|
|
|
|
**F-D1-2 MEDIUM: No recovery path from machine ID mismatch.** The only fix is manual: delete the agent from the DB and re-register. There's no "update machine ID" API.
|
|
|
|
---
|
|
|
|
## 4. EXAMPLE_INTEGRATION.GO
|
|
|
|
- **Imported:** NO (zero results from grep)
|
|
- **Package:** `logging` — it IS part of the package and its functions are exported
|
|
- **Called:** NO — none of its functions are called anywhere
|
|
- **Risk:** `ExampleMachineIDMonitoring()` calls `machineid.ID()` directly (unhashed)
|
|
- **Candidate for deletion:** YES — dead example code that bypasses the canonical path
|
|
|
|
**F-D1-3 LOW:** `example_integration.go` is dead code. Its `ExampleMachineIDMonitoring()` function calls `machineid.ID()` directly, bypassing the canonical `GetMachineID()`. If anyone copies this example, they'll get unhashed machine IDs.
|
|
|
|
---
|
|
|
|
## 5. WINDOWS-SPECIFIC PATH
|
|
|
|
### 5a. Fallback Chains
|
|
|
|
**Linux:** machineid.ID() → /etc/machine-id → /var/lib/dbus/machine-id → /sys/class/dmi/id/product_uuid → hostname-linux-fallback → hostname-os-arch
|
|
|
|
**Windows:** machineid.ID() → machineid.ID() (redundant retry) → hostname-os-arch
|
|
|
|
**F-D1-4 LOW:** Windows `getWindowsMachineID()` calls `machineid.ID()` again after the primary already tried it and failed. This is a no-op retry — the same function will fail again.
|
|
|
|
### 5b. Dual-Boot Collision
|
|
|
|
Windows MachineGuid (registry) and Linux /etc/machine-id are independent. A dual-boot system produces different machine IDs for each OS. No collision risk.
|
|
|
|
### 5c. Windows Reinstall
|
|
|
|
Windows MachineGuid changes on reinstall. This is NOT documented in the codebase. After reinstalling Windows, the agent will produce a different machine ID → 403 from MachineBindingMiddleware → must re-register.
|
|
|
|
---
|
|
|
|
## 6. REGISTRATION vs RUNTIME CONSISTENCY
|
|
|
|
### 6a. Registration Path (main.go:425-443)
|
|
|
|
`system.GetMachineID()` → if error: `"unknown-" + hostname` (UNHASHED) → stored in DB
|
|
|
|
### 6b. Runtime Path (client/client.go:36-41)
|
|
|
|
`system.GetMachineID()` → if error: `""` (empty) → sent as X-Machine-ID header → server rejects with 403
|
|
|
|
### 6c. Are They Guaranteed Identical?
|
|
|
|
**NO.** Three scenarios cause divergence:
|
|
|
|
1. **Registration succeeds, runtime fails:** Registration stores SHA256 hash. Runtime sends empty string → 403.
|
|
2. **Registration fails, runtime succeeds:** Registration stores `"unknown-hostname"`. Runtime sends SHA256 hash → 403.
|
|
3. **Both fail but produce different fallbacks:** Registration uses `"unknown-hostname"` (unhashed, from main.go:428). Runtime uses empty string (from client.go:40). These don't match → 403.
|
|
|
|
**F-D1-1 is the root cause.** The main.go fallback at line 428 produces a fundamentally different format than the canonical function.
|
|
|
|
---
|
|
|
|
## 7. ETHOS CROSS-CHECK
|
|
|
|
| Principle | Status | Finding |
|
|
|-----------|--------|---------|
|
|
| ETHOS #1 | PARTIAL | GetMachineID failure logged in main.go (line 427) and client.go (line 39). But client.go uses `fmt.Printf` instead of `log.Printf`. |
|
|
| ETHOS #4 | VIOLATION | Machine ID is NOT idempotent when the error fallback activates. Registration path and runtime path produce different values for the same failure condition. |
|
|
|
|
---
|
|
|
|
## FINDINGS SUMMARY
|
|
|
|
| ID | Severity | Finding | Location |
|
|
|----|----------|---------|----------|
|
|
| F-D1-1 | HIGH | Registration fallback `"unknown-"+hostname` is unhashed and mismatches runtime path, causing permanent agent lockout on recovery | cmd/agent/main.go:428 |
|
|
| F-D1-2 | MEDIUM | No recovery path from machine ID mismatch — must delete and re-register agent | machine_binding.go:149 |
|
|
| F-D1-3 | LOW | `example_integration.go` is dead code that calls `machineid.ID()` directly (unhashed), bypassing canonical path | logging/example_integration.go:71 |
|
|
| F-D1-4 | LOW | Windows `getWindowsMachineID()` redundantly retries `machineid.ID()` after primary already failed | system/machine_id.go:82 |
|
|
| F-D1-5 | LOW | `client.go:39` uses `fmt.Printf` instead of `log.Printf` for machine ID error (ETHOS #1) | client/client.go:39 |
|
|
| F-D1-6 | INFO | Windows reinstall changes MachineGuid, causing agent lockout — not documented | system/machine_id.go |
|