diff --git a/docs/D1_MachineID_Audit.md b/docs/D1_MachineID_Audit.md new file mode 100644 index 0000000..9d89acb --- /dev/null +++ b/docs/D1_MachineID_Audit.md @@ -0,0 +1,170 @@ +# D-1 Machine ID Duplication Audit + +**Date:** 2026-03-29 +**Branch:** culurien +**Scope:** Machine ID implementation consistency across agent codebase + +--- + +## 1. ALL MACHINE ID IMPLEMENTATIONS + +### 1a. Canonical: `system/machine_id.go` + +**GetMachineID()** (line 15): +1. `machineid.ID()` — cross-platform library (Windows: Registry MachineGuid, Linux: /etc/machine-id) +2. If success: `hashMachineID(id)` → SHA256 → 64 hex chars +3. If fail: OS-specific fallback + +**Linux fallback chain** (line 43): +1. `/etc/machine-id` → hash +2. `/var/lib/dbus/machine-id` → hash +3. `/sys/class/dmi/id/product_uuid` → hash +4. `/etc/hostname` + "-linux-fallback" → hash + +**Windows fallback** (line 80): +1. `machineid.ID()` again (same as primary — redundant) +2. `generateGenericMachineID()` → hostname-os-arch → hash + +**Generic fallback** (line 102): +`hostname-goos-goarch` → `hashMachineID()` → 64 hex chars + +**Output format:** ALWAYS 64 hex characters (SHA256). Every path goes through `hashMachineID()`. + +### 1b. Client: `client/client.go` + +Line 36: `machineID, err := system.GetMachineID()` — calls canonical function. Cached in `Client.machineID` string field. Sent as `X-Machine-ID` header on every authenticated request (line 108). + +If `GetMachineID()` fails: `machineID = ""` (empty string). Server will reject with 403 "missing machine ID header". + +**Consistent with canonical:** YES. + +### 1c. Registration: `cmd/agent/main.go` + +Line 425: `machineID, err := system.GetMachineID()` — calls canonical function. +Line 428: **ERROR FALLBACK: `machineID = "unknown-" + sysInfo.Hostname`** + +This fallback is **NOT HASHED** and **NOT 64 HEX CHARS**. It produces a string like `"unknown-my-server"` (14-30 chars, alphanumeric with dashes). + +Line 443: `MachineID: machineID` — sent in `RegisterRequest.MachineID` and stored in `agents.machine_id` column. + +### 1d. Example: `logging/example_integration.go` + +Line 71: `machineid.ID()` — calls library DIRECTLY, bypasses `GetMachineID()` and `hashMachineID()`. Returns raw, unhashed machine ID. + +**Not imported anywhere.** File is `package logging` but none of its functions are called from production code. It's dead example code. + +--- + +## 2. ALL CALL SITES + +| Location | Method | Hashed? | Consistent? | +|----------|--------|---------|-------------| +| `system/machine_id.go:17` | `machineid.ID()` → `hashMachineID()` | YES | Canonical | +| `system/machine_id.go:82` | `machineid.ID()` → `hashMachineID()` | YES | Canonical | +| `system/machine_id.go:93` | `machineid.ID()` → `hashMachineID()` | YES | Canonical | +| `system/machine_id.go:102` | hostname+os+arch → `hashMachineID()` | YES | Canonical | +| `client/client.go:36` | `system.GetMachineID()` | YES | Consistent | +| `cmd/agent/main.go:425` | `system.GetMachineID()` | YES | Consistent | +| `cmd/agent/main.go:428` | `"unknown-" + hostname` | **NO** | **DIVERGENT** | +| `logging/example_integration.go:71` | `machineid.ID()` direct | **NO** | **DEAD CODE** | + +--- + +## 3. DIVERGENCE ANALYSIS + +### 3a. Can the three production call sites return different values? + +**Normal case (GetMachineID succeeds):** YES, all three return the same SHA256-hashed value. `client.go` and `main.go` both call `system.GetMachineID()`. + +**Failure case (main.go line 428):** +- **Registration (main.go):** `"unknown-my-server"` — 14-30 chars, alphanumeric, NOT hashed +- **Runtime (client.go):** If `GetMachineID()` fails at client construction, `machineID = ""` — empty string → server rejects with 403 + +**F-D1-1 HIGH: Registration and runtime machine IDs can diverge.** +If `GetMachineID()` fails during registration, the agent registers with `"unknown-hostname"` (unhashed). On subsequent restarts, if `GetMachineID()` succeeds, the client sends a SHA256 hash. The server compares: `"unknown-hostname" != "a7f3...64hexchars..."` → **403 FORBIDDEN**. The agent is permanently locked out until re-registered. + +### 3b. Server-Side Validation + +`machine_binding.go:149`: `*agent.MachineID != reportedMachineID` — simple string equality comparison. No format validation. Would accept BOTH `"unknown-hostname"` (if that's what was registered) AND `"a7f3...64hex..."`. But they must match exactly. + +### 3c. Registration vs Runtime Mismatch + +If the agent registered with `"unknown-hostname"` (fallback) but restarts and `GetMachineID()` now succeeds (transient error resolved), the client sends a SHA256 hash that doesn't match the stored `"unknown-hostname"` → **permanent lockout**. + +**F-D1-2 MEDIUM: No recovery path from machine ID mismatch.** The only fix is manual: delete the agent from the DB and re-register. There's no "update machine ID" API. + +--- + +## 4. EXAMPLE_INTEGRATION.GO + +- **Imported:** NO (zero results from grep) +- **Package:** `logging` — it IS part of the package and its functions are exported +- **Called:** NO — none of its functions are called anywhere +- **Risk:** `ExampleMachineIDMonitoring()` calls `machineid.ID()` directly (unhashed) +- **Candidate for deletion:** YES — dead example code that bypasses the canonical path + +**F-D1-3 LOW:** `example_integration.go` is dead code. Its `ExampleMachineIDMonitoring()` function calls `machineid.ID()` directly, bypassing the canonical `GetMachineID()`. If anyone copies this example, they'll get unhashed machine IDs. + +--- + +## 5. WINDOWS-SPECIFIC PATH + +### 5a. Fallback Chains + +**Linux:** machineid.ID() → /etc/machine-id → /var/lib/dbus/machine-id → /sys/class/dmi/id/product_uuid → hostname-linux-fallback → hostname-os-arch + +**Windows:** machineid.ID() → machineid.ID() (redundant retry) → hostname-os-arch + +**F-D1-4 LOW:** Windows `getWindowsMachineID()` calls `machineid.ID()` again after the primary already tried it and failed. This is a no-op retry — the same function will fail again. + +### 5b. Dual-Boot Collision + +Windows MachineGuid (registry) and Linux /etc/machine-id are independent. A dual-boot system produces different machine IDs for each OS. No collision risk. + +### 5c. Windows Reinstall + +Windows MachineGuid changes on reinstall. This is NOT documented in the codebase. After reinstalling Windows, the agent will produce a different machine ID → 403 from MachineBindingMiddleware → must re-register. + +--- + +## 6. REGISTRATION vs RUNTIME CONSISTENCY + +### 6a. Registration Path (main.go:425-443) + +`system.GetMachineID()` → if error: `"unknown-" + hostname` (UNHASHED) → stored in DB + +### 6b. Runtime Path (client/client.go:36-41) + +`system.GetMachineID()` → if error: `""` (empty) → sent as X-Machine-ID header → server rejects with 403 + +### 6c. Are They Guaranteed Identical? + +**NO.** Three scenarios cause divergence: + +1. **Registration succeeds, runtime fails:** Registration stores SHA256 hash. Runtime sends empty string → 403. +2. **Registration fails, runtime succeeds:** Registration stores `"unknown-hostname"`. Runtime sends SHA256 hash → 403. +3. **Both fail but produce different fallbacks:** Registration uses `"unknown-hostname"` (unhashed, from main.go:428). Runtime uses empty string (from client.go:40). These don't match → 403. + +**F-D1-1 is the root cause.** The main.go fallback at line 428 produces a fundamentally different format than the canonical function. + +--- + +## 7. ETHOS CROSS-CHECK + +| Principle | Status | Finding | +|-----------|--------|---------| +| ETHOS #1 | PARTIAL | GetMachineID failure logged in main.go (line 427) and client.go (line 39). But client.go uses `fmt.Printf` instead of `log.Printf`. | +| ETHOS #4 | VIOLATION | Machine ID is NOT idempotent when the error fallback activates. Registration path and runtime path produce different values for the same failure condition. | + +--- + +## FINDINGS SUMMARY + +| ID | Severity | Finding | Location | +|----|----------|---------|----------| +| F-D1-1 | HIGH | Registration fallback `"unknown-"+hostname` is unhashed and mismatches runtime path, causing permanent agent lockout on recovery | cmd/agent/main.go:428 | +| F-D1-2 | MEDIUM | No recovery path from machine ID mismatch — must delete and re-register agent | machine_binding.go:149 | +| F-D1-3 | LOW | `example_integration.go` is dead code that calls `machineid.ID()` directly (unhashed), bypassing canonical path | logging/example_integration.go:71 | +| F-D1-4 | LOW | Windows `getWindowsMachineID()` redundantly retries `machineid.ID()` after primary already failed | system/machine_id.go:82 | +| F-D1-5 | LOW | `client.go:39` uses `fmt.Printf` instead of `log.Printf` for machine ID error (ETHOS #1) | client/client.go:39 | +| F-D1-6 | INFO | Windows reinstall changes MachineGuid, causing agent lockout — not documented | system/machine_id.go |