Files
Redflag/docs/D1_MachineID_Audit.md
jpetree331 8530e6c6fc docs: D-1 machine ID duplication audit
Comprehensive audit of machine ID implementations across the
agent codebase. Identified 3 production call sites with 1 critical
divergence.

Key findings:
- F-D1-1 HIGH: Registration fallback "unknown-"+hostname is unhashed,
  mismatches runtime SHA256 hash, causes permanent agent lockout
  when GetMachineID() transiently fails then recovers
- F-D1-2 MEDIUM: No recovery path from machine ID mismatch
- F-D1-3 LOW: example_integration.go is dead code calling
  machineid.ID() directly (bypasses canonical hashing)
- F-D1-4 LOW: Windows redundant machineid.ID() retry
- F-D1-5 LOW: client.go uses fmt.Printf for machine ID error

6 findings total. See docs/D1_MachineID_Audit.md for details.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 09:34:04 -04:00

8.4 KiB

D-1 Machine ID Duplication Audit

Date: 2026-03-29 Branch: culurien Scope: Machine ID implementation consistency across agent codebase


1. ALL MACHINE ID IMPLEMENTATIONS

1a. Canonical: system/machine_id.go

GetMachineID() (line 15):

  1. machineid.ID() — cross-platform library (Windows: Registry MachineGuid, Linux: /etc/machine-id)
  2. If success: hashMachineID(id) → SHA256 → 64 hex chars
  3. If fail: OS-specific fallback

Linux fallback chain (line 43):

  1. /etc/machine-id → hash
  2. /var/lib/dbus/machine-id → hash
  3. /sys/class/dmi/id/product_uuid → hash
  4. /etc/hostname + "-linux-fallback" → hash

Windows fallback (line 80):

  1. machineid.ID() again (same as primary — redundant)
  2. generateGenericMachineID() → hostname-os-arch → hash

Generic fallback (line 102): hostname-goos-goarchhashMachineID() → 64 hex chars

Output format: ALWAYS 64 hex characters (SHA256). Every path goes through hashMachineID().

1b. Client: client/client.go

Line 36: machineID, err := system.GetMachineID() — calls canonical function. Cached in Client.machineID string field. Sent as X-Machine-ID header on every authenticated request (line 108).

If GetMachineID() fails: machineID = "" (empty string). Server will reject with 403 "missing machine ID header".

Consistent with canonical: YES.

1c. Registration: cmd/agent/main.go

Line 425: machineID, err := system.GetMachineID() — calls canonical function. Line 428: ERROR FALLBACK: machineID = "unknown-" + sysInfo.Hostname

This fallback is NOT HASHED and NOT 64 HEX CHARS. It produces a string like "unknown-my-server" (14-30 chars, alphanumeric with dashes).

Line 443: MachineID: machineID — sent in RegisterRequest.MachineID and stored in agents.machine_id column.

1d. Example: logging/example_integration.go

Line 71: machineid.ID() — calls library DIRECTLY, bypasses GetMachineID() and hashMachineID(). Returns raw, unhashed machine ID.

Not imported anywhere. File is package logging but none of its functions are called from production code. It's dead example code.


2. ALL CALL SITES

Location Method Hashed? Consistent?
system/machine_id.go:17 machineid.ID()hashMachineID() YES Canonical
system/machine_id.go:82 machineid.ID()hashMachineID() YES Canonical
system/machine_id.go:93 machineid.ID()hashMachineID() YES Canonical
system/machine_id.go:102 hostname+os+arch → hashMachineID() YES Canonical
client/client.go:36 system.GetMachineID() YES Consistent
cmd/agent/main.go:425 system.GetMachineID() YES Consistent
cmd/agent/main.go:428 "unknown-" + hostname NO DIVERGENT
logging/example_integration.go:71 machineid.ID() direct NO DEAD CODE

3. DIVERGENCE ANALYSIS

3a. Can the three production call sites return different values?

Normal case (GetMachineID succeeds): YES, all three return the same SHA256-hashed value. client.go and main.go both call system.GetMachineID().

Failure case (main.go line 428):

  • Registration (main.go): "unknown-my-server" — 14-30 chars, alphanumeric, NOT hashed
  • Runtime (client.go): If GetMachineID() fails at client construction, machineID = "" — empty string → server rejects with 403

F-D1-1 HIGH: Registration and runtime machine IDs can diverge. If GetMachineID() fails during registration, the agent registers with "unknown-hostname" (unhashed). On subsequent restarts, if GetMachineID() succeeds, the client sends a SHA256 hash. The server compares: "unknown-hostname" != "a7f3...64hexchars..."403 FORBIDDEN. The agent is permanently locked out until re-registered.

3b. Server-Side Validation

machine_binding.go:149: *agent.MachineID != reportedMachineID — simple string equality comparison. No format validation. Would accept BOTH "unknown-hostname" (if that's what was registered) AND "a7f3...64hex...". But they must match exactly.

3c. Registration vs Runtime Mismatch

If the agent registered with "unknown-hostname" (fallback) but restarts and GetMachineID() now succeeds (transient error resolved), the client sends a SHA256 hash that doesn't match the stored "unknown-hostname"permanent lockout.

F-D1-2 MEDIUM: No recovery path from machine ID mismatch. The only fix is manual: delete the agent from the DB and re-register. There's no "update machine ID" API.


4. EXAMPLE_INTEGRATION.GO

  • Imported: NO (zero results from grep)
  • Package: logging — it IS part of the package and its functions are exported
  • Called: NO — none of its functions are called anywhere
  • Risk: ExampleMachineIDMonitoring() calls machineid.ID() directly (unhashed)
  • Candidate for deletion: YES — dead example code that bypasses the canonical path

F-D1-3 LOW: example_integration.go is dead code. Its ExampleMachineIDMonitoring() function calls machineid.ID() directly, bypassing the canonical GetMachineID(). If anyone copies this example, they'll get unhashed machine IDs.


5. WINDOWS-SPECIFIC PATH

5a. Fallback Chains

Linux: machineid.ID() → /etc/machine-id → /var/lib/dbus/machine-id → /sys/class/dmi/id/product_uuid → hostname-linux-fallback → hostname-os-arch

Windows: machineid.ID() → machineid.ID() (redundant retry) → hostname-os-arch

F-D1-4 LOW: Windows getWindowsMachineID() calls machineid.ID() again after the primary already tried it and failed. This is a no-op retry — the same function will fail again.

5b. Dual-Boot Collision

Windows MachineGuid (registry) and Linux /etc/machine-id are independent. A dual-boot system produces different machine IDs for each OS. No collision risk.

5c. Windows Reinstall

Windows MachineGuid changes on reinstall. This is NOT documented in the codebase. After reinstalling Windows, the agent will produce a different machine ID → 403 from MachineBindingMiddleware → must re-register.


6. REGISTRATION vs RUNTIME CONSISTENCY

6a. Registration Path (main.go:425-443)

system.GetMachineID() → if error: "unknown-" + hostname (UNHASHED) → stored in DB

6b. Runtime Path (client/client.go:36-41)

system.GetMachineID() → if error: "" (empty) → sent as X-Machine-ID header → server rejects with 403

6c. Are They Guaranteed Identical?

NO. Three scenarios cause divergence:

  1. Registration succeeds, runtime fails: Registration stores SHA256 hash. Runtime sends empty string → 403.
  2. Registration fails, runtime succeeds: Registration stores "unknown-hostname". Runtime sends SHA256 hash → 403.
  3. Both fail but produce different fallbacks: Registration uses "unknown-hostname" (unhashed, from main.go:428). Runtime uses empty string (from client.go:40). These don't match → 403.

F-D1-1 is the root cause. The main.go fallback at line 428 produces a fundamentally different format than the canonical function.


7. ETHOS CROSS-CHECK

Principle Status Finding
ETHOS #1 PARTIAL GetMachineID failure logged in main.go (line 427) and client.go (line 39). But client.go uses fmt.Printf instead of log.Printf.
ETHOS #4 VIOLATION Machine ID is NOT idempotent when the error fallback activates. Registration path and runtime path produce different values for the same failure condition.

FINDINGS SUMMARY

ID Severity Finding Location
F-D1-1 HIGH Registration fallback "unknown-"+hostname is unhashed and mismatches runtime path, causing permanent agent lockout on recovery cmd/agent/main.go:428
F-D1-2 MEDIUM No recovery path from machine ID mismatch — must delete and re-register agent machine_binding.go:149
F-D1-3 LOW example_integration.go is dead code that calls machineid.ID() directly (unhashed), bypassing canonical path logging/example_integration.go:71
F-D1-4 LOW Windows getWindowsMachineID() redundantly retries machineid.ID() after primary already failed system/machine_id.go:82
F-D1-5 LOW client.go:39 uses fmt.Printf instead of log.Printf for machine ID error (ETHOS #1) client/client.go:39
F-D1-6 INFO Windows reinstall changes MachineGuid, causing agent lockout — not documented system/machine_id.go