fix(identity): D-1 machine ID deduplication fixes
- Remove unhashed 'unknown-' fallback from registration (F-D1-1) Registration aborts if GetMachineID() fails (no bad data) - Add POST /admin/agents/:id/rebind-machine-id endpoint (F-D1-2) Admin can update stored machine ID after hardware change - Delete dead example_integration.go with wrong usage (F-D1-3) - Remove redundant Windows machineid.ID() retry (F-D1-4) - Replace fmt.Printf with log.Printf in client.go (F-D1-5) Operator note: agents registered with 'unknown-' machine IDs must be rebound before upgrading. See D1_Fix_Implementation.md. All tests pass. No regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
47
docs/D1_Fix_Implementation.md
Normal file
47
docs/D1_Fix_Implementation.md
Normal file
@@ -0,0 +1,47 @@
|
||||
# D-1 Machine ID Fix Implementation
|
||||
|
||||
**Date:** 2026-03-29
|
||||
**Branch:** culurien
|
||||
|
||||
---
|
||||
|
||||
## Files Changed
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `cmd/agent/main.go` | Removed unhashed "unknown-" fallback; registration aborts if GetMachineID() fails (F-D1-1) |
|
||||
| `internal/client/client.go` | Replaced fmt.Printf with log.Printf for machine ID errors (F-D1-5) |
|
||||
| `internal/system/machine_id.go` | Removed redundant machineid.ID() retry in Windows fallback, added Windows reinstall documentation (F-D1-4) |
|
||||
| `internal/logging/example_integration.go` | DELETED — dead code with incorrect machineid.ID() usage (F-D1-3) |
|
||||
| `server/internal/api/handlers/agents.go` | Added RebindMachineID admin endpoint (F-D1-2) |
|
||||
| `server/internal/database/queries/agents.go` | Added UpdateMachineID query function (F-D1-2) |
|
||||
| `server/cmd/server/main.go` | Registered rebind-machine-id admin route (F-D1-2) |
|
||||
|
||||
## Strategy (Task 1): Option C
|
||||
|
||||
Used Option C — trust canonical `system.GetMachineID()` entirely. If it fails (which requires ALL fallbacks to fail including hostname-os-arch), abort registration with `log.Fatalf`. This is the safest approach: the internal fallback chain in GetMachineID() always produces a SHA256 hash, so format consistency is guaranteed.
|
||||
|
||||
## Operator Migration Guide
|
||||
|
||||
If any agents were registered with the old "unknown-hostname" fallback (identifiable by `machine_id` not being 64 hex chars in the DB), they will be locked out after this upgrade because the new runtime client sends a proper SHA256 hash. To recover:
|
||||
|
||||
```sql
|
||||
SELECT id, hostname, machine_id FROM agents
|
||||
WHERE LENGTH(machine_id) != 64 OR machine_id LIKE 'unknown-%';
|
||||
```
|
||||
|
||||
For each agent found, use the rebind endpoint:
|
||||
```
|
||||
POST /api/v1/admin/agents/{id}/rebind-machine-id
|
||||
{"new_machine_id": "<64-char hex string from agent>"}
|
||||
```
|
||||
|
||||
Or re-register the agent with a new registration token.
|
||||
|
||||
## Rebind Endpoint Specification
|
||||
|
||||
- **Route:** `POST /api/v1/admin/agents/:id/rebind-machine-id`
|
||||
- **Auth:** WebAuthMiddleware + RequireAdmin (admin group)
|
||||
- **Input:** `{"new_machine_id": "64-char-lowercase-hex-string"}`
|
||||
- **Validation:** exactly 64 chars, lowercase hex only [0-9a-f]
|
||||
- **Audit log:** old and new machine ID logged with admin user ID
|
||||
Reference in New Issue
Block a user