# A-2 Command Replay Attack Audit **Date**: 2026-03-28 **Branch**: unstabledeveloper **Scope**: Audit-only — no implementation changes --- ## 1. Signed Command Payload Analysis ### What fields are included in the signed message **New format** (when `cmd.SignedAt != nil`): ``` {cmd.ID}:{cmd.CommandType}:{sha256(json(cmd.Params))}:{cmd.SignedAt.Unix()} ``` Source: `aggregator-server/internal/services/signing.go:361`, `aggregator-agent/internal/crypto/verification.go:71` **Old format** (backward compat, when `cmd.SignedAt == nil`): ``` {cmd.ID}:{cmd.CommandType}:{sha256(json(cmd.Params))} ``` Source: `aggregator-agent/internal/crypto/verification.go:55` ### What is NOT in the signed payload | Field | In signed payload? | Notes | |---|---|---| | `cmd.ID` (UUID) | YES | Unique per-command identifier | | `cmd.CommandType` | YES | e.g. `install_updates`, `reboot` | | `sha256(params)` | YES | Hash of full params JSON | | `signed_at` timestamp | YES (new format only) | Unix seconds | | `cmd.AgentID` | **NO** | Absent from signature | | `cmd.Source` | **NO** | Absent from signature | | `cmd.Status` | **NO** | Absent from signature | | Nonce | **NO** | Not used in command signing | **FINDING F-1 (HIGH)**: `agent_id` is not included in the signed payload. A valid signed command is not cryptographically bound to a specific agent. The only uniqueness guarantee is the command UUID — if an attacker could inject a captured command into a different agent's command queue, the signature would verify correctly. --- ## 2. Nonce Mechanism ### What the nonce looks like The `SigningService` in `aggregator-server/internal/services/signing.go` has two nonce methods: ```go func (s *SigningService) SignNonce(nonceUUID uuid.UUID, timestamp time.Time) (string, error) func (s *SigningService) VerifyNonce(nonceUUID uuid.UUID, timestamp time.Time, signatureHex string, maxAge time.Duration) (bool, error) ``` Nonce format: `"{uuid}:{unix_timestamp}"` — signed with Ed25519. ### Where nonces are used **Nonces are NOT used in command signing or command verification.** The `SignNonce`/`VerifyNonce` methods exist exclusively for the agent update package flow (preventing replay of update download requests). They are completely disconnected from the command replay protection path. The agent's `ProcessCommand` function (`command_handler.go:101`) calls `VerifyCommandWithTimestamp` or `VerifyCommand`. Neither of these checks any nonce. There is no nonce storage, no nonce tracking map, and no nonce field in `AgentCommand` or `CommandItem`. **FINDING F-2 (CRITICAL)**: There is no nonce in the command signing path. The original issue comment ("nonce-only replay protection") is inaccurate in the opposite direction — there is no nonce AND no reliable replay protection for commands signed with the old format. --- ## 3. Verification Function Behaviour ### `VerifyCommand` (old format, no timestamp) Source: `aggregator-agent/internal/crypto/verification.go:25` Checks: 1. Signature field is non-empty 2. Signature is valid hex, correct length (64 bytes) 3. Ed25519 signature over `{id}:{type}:{sha256(params)}` verifies against public key Returns: `error` (nil = pass). **No time check. No nonce check.** **FINDING F-3 (CRITICAL)**: Commands signed with the old format (no `signed_at`) are valid indefinitely. A captured signature can be replayed at any time in the future — there is no expiry mechanism for old-format commands. ### `VerifyCommandWithTimestamp` (new format) Source: `aggregator-agent/internal/crypto/verification.go:85` Checks: 1. If `cmd.SignedAt == nil` → falls back to `VerifyCommand()` (see F-3) 2. `age = now.Sub(*cmd.SignedAt)` must satisfy: `age <= 24h` AND `age >= -5min` 3. Signature valid over `{id}:{type}:{sha256(params)}:{unix_timestamp}` **FINDING F-4 (HIGH)**: 24-hour replay window. A captured signed command remains valid for replay for up to 24 hours from signing time. This is the default value of `commandMaxAge = 24 * time.Hour` defined in `command_handler.go:21`. --- ## 4. Command Creation Flow ### Full path: Dashboard approves install → command signed → stored ``` POST /updates/:id/install → UpdateHandler.InstallUpdate() [handlers/updates.go:459] → models.AgentCommand{...} [no signing yet] → h.agentHandler.signAndCreateCommand(cmd) [agents.go:49] → signingService.SignCommand(cmd) [services/signing.go:345] → cmd.SignedAt = &now [side-effect] → cmd.KeyID = GetCurrentKeyID() [side-effect] → message = "{id}:{type}:{hash}:{ts}" → ed25519.Sign(privateKey, message) → returns hex signature → cmd.Signature = signature → commandQueries.CreateCommand(cmd) [queries/commands.go:22] → INSERT INTO agent_commands (... key_id, signed_at ...) ``` The `ConfirmDependencies` and `ReportDependencies` (auto-install) handlers follow identical paths through `signAndCreateCommand`. ### RetryCommand path (DOES NOT RE-SIGN) ``` POST /commands/:id/retry → UpdateHandler.RetryCommand() [handlers/updates.go:779] → commandQueries.RetryCommand(id) [queries/commands.go:189] → newCommand = AgentCommand{ [copies Params, new UUID] Signature: "", [EMPTY — not re-signed] SignedAt: nil, [nil — no timestamp] KeyID: "", [empty — no key reference] } → q.CreateCommand(newCommand) [stored unsigned] ``` **FINDING F-5 (CRITICAL)**: `RetryCommand` creates a new command without calling `signAndCreateCommand`. The retried command has `Signature = ""`, `SignedAt = nil`, `KeyID = ""`. In strict enforcement mode, the agent rejects any command with an empty signature. This means **the retry feature is entirely broken when command signing is enabled in strict mode**. The HTTP handler in `updates.go:779` returns 200 OK and the command is stored in the DB, but the agent will reject it every time it polls. --- ## 5. Agent Command Fetch and Execution Flow ### Full path: Agent polls → receives commands → verifies → executes ``` GET /api/v1/agents/{id}/commands → AgentHandler.GetCommands() [handlers/agents.go:204] → commandQueries.GetPendingCommands(agentID) [status = 'pending' only] → commandQueries.GetStuckCommands(agentID, 5m) [sent > 5 min, not completed] → allCommands = pending + stuck → for each cmd: MarkCommandSent(cmd.ID) [transitions pending → sent] → returns CommandItem{ID, Type, Params, Signature, KeyID, SignedAt} ``` Agent-side: ``` main.go:875: apiClient.GetCommands(cfg.AgentID, metrics) main.go:928: for _, cmd := range commands { main.go:932: commandHandler.ProcessCommand(cmd, cfg, cfg.AgentID) main.go:954: switch cmd.Type { ... execute ... } ``` ### What `GetPendingCommands` returns ```sql SELECT * FROM agent_commands WHERE agent_id = $1 AND status = 'pending' ORDER BY created_at ASC LIMIT 100 ``` There is no `WHERE created_at > NOW() - INTERVAL '24 hours'` filter. A command created 30 days ago with status `pending` (e.g., if it was never successfully sent) would be returned. If it has the old-format signature (no `signed_at`), the agent would execute it with no time check. **FINDING F-6 (HIGH)**: The server-side command queue has no TTL filter. Old pending commands are delivered indefinitely. Combined with old-format signing (F-3), this means commands can persist in the queue and be executed arbitrarily long after creation. --- ## 6. Database Schema — TTL and Command Expiry ### agent_commands table (from migration 001 + amendments) ```sql CREATE TABLE agent_commands ( id UUID PRIMARY KEY, agent_id UUID REFERENCES agents(id), command_type VARCHAR(50), params JSONB, status VARCHAR(20) DEFAULT 'pending', created_at TIMESTAMP DEFAULT NOW(), sent_at TIMESTAMP, completed_at TIMESTAMP, result JSONB, signature VARCHAR(128), -- migration 020 key_id VARCHAR(64), -- migration 025 signed_at TIMESTAMP, -- migration 025 idempotency_key VARCHAR(64) UNIQUE -- migration 023a ); ``` **FINDING F-7 (HIGH)**: No `expires_at` column exists. No TTL constraint exists. No scheduled cleanup job for old pending commands exists in the codebase. The only cleanup mechanisms are: - Manual `ClearOldFailedCommands(days)` — applies to `failed`/`timed_out` only, not `pending` - Manual `CancelCommand(id)` — single-command manual cancellation - The deduplication index from migration 023a prevents duplicate pending commands per `(agent_id, command_type)`, but this only prevents new duplicates — it doesn't expire old ones --- ## 7. Attack Surface Assessment ### Can a captured signed command be replayed indefinitely? **New format (with `signed_at`)**: Replayable for 24 hours from signing time. After that, `VerifyCommandWithTimestamp` rejects it as too old. **Old format (no `signed_at`)**: **YES — replayable indefinitely.** `VerifyCommand` has no time check. Any command signed before the A-1 implementation was deployed (before `signed_at` was added) is permanently replayable. The backward-compatibility fallback in `VerifyCommandWithTimestamp` (`if cmd.SignedAt == nil → VerifyCommand`) means new servers talking to old agents, or commands in the DB pre-dating migration 025, all fall into the unlimited-replay category. ### Replay attack scenarios **Scenario A — Network MITM (24h window)** An attacker positioned between server and agent captures a valid `install_updates` command with `signed_at` set. Within 24 hours, they can re-present this command to the agent. If the agent's command handler receives it (via MITM on the polling response), it passes `VerifyCommandWithTimestamp` and is executed — potentially installing the same update a second time, or more dangerously triggering a `reboot` or `update_agent` command twice. **Scenario B — Old-format signature captured forever** Any command signed before `signed_at` support was deployed (old server version or commands created before migration 025 ran) has no timestamp. A captured signature is valid forever. The only defense is that the command UUID must match, but if an attacker can inject a command with a matching UUID into the DB, verification passes. **Scenario C — Retry creates unsigned commands (strict mode)** An operator clicks "Retry" on a failed `install_updates` command. The server creates a new unsigned command. In strict mode, the agent rejects it silently (logs the rejection, reports `failed` to the server). The operator may not understand why the retry keeps failing, and may downgrade the enforcement mode to `warning` as a workaround — which is exactly the wrong response. **Scenario D — `agent_id` not in signature (cross-agent injection)** If an attacker can write to the `agent_commands` table directly (e.g., via SQL injection elsewhere, or compromised server credentials), they can copy a signed command for agent A into agent B's queue. The Ed25519 signature will verify correctly on agent B because `agent_id` is not in the signed content. **Scenario E — Stuck command re-execution** The `GetStuckCommands` query re-delivers commands that are in `sent` status for > 5 minutes. If a command was genuinely stuck (network failure, agent restart), it may be re-executed when the agent comes back online. If the command is `reboot` or `install_updates`, this can cause unintended repeated execution. There is no duplicate-execution guard on the agent side (no "already executed command ID" tracking). --- ## 8. Summary Table | Finding | Severity | Description | |---------|----------|-------------| | **F-1** | HIGH | `agent_id` absent from signed payload — commands not cryptographically bound to a specific agent | | **F-2** | CRITICAL | No nonce in command signing path — no single-use guarantee for command signatures | | **F-3** | CRITICAL | Old-format commands (no `signed_at`) have zero time-based replay protection — valid forever | | **F-4** | HIGH | 24-hour replay window for new-format commands — adequate for most attacks but generous | | **F-5** | CRITICAL | `RetryCommand` creates unsigned commands — entire retry feature broken in strict enforcement mode | | **F-6** | HIGH | Server `GetPendingCommands` has no TTL filter — stale pending commands delivered indefinitely | | **F-7** | HIGH | No `expires_at` column in `agent_commands` — no schema-enforced command TTL | ### Severity definitions used - **CRITICAL**: Exploitable by an attacker with no special access, or breaks core security feature silently - **HIGH**: Requires attacker to have partial access (MITM position, DB access) or silently degrades security posture --- ## 9. Out of Scope / Confirmed Clean - The Ed25519 signing algorithm itself is correctly implemented (A-1 verified). - The key rotation implementation (A-1) correctly identifies and uses the right public key per command. - The timestamp arithmetic in `VerifyCommandWithTimestamp` is not inverted (verified in A-1). - The JWT authentication on `GET /agents/:id/commands` is enforced by middleware — an unauthenticated attacker cannot directly call the command endpoint to inject commands through the server API. - The deduplication index (migration 023a) prevents duplicate `pending` commands of the same type per agent. --- ## 10. Recommended Fixes (Prioritised, Not Yet Implemented) | Priority | Fix | Addresses | |----------|-----|-----------| | 1 | Re-sign commands in `RetryCommand` — call `signAndCreateCommand` instead of `commandQueries.CreateCommand` directly | F-5 | | 2 | Add `agent_id` to the signed message payload | F-1 | | 3 | Add server-side command TTL: `expires_at` column + filter in `GetPendingCommands` | F-6, F-7 | | 4 | Add agent-side executed-command deduplication: an in-memory or on-disk set of recently executed command UUIDs | F-2 (partial), F-4 | | 5 | Remove old-format (no-timestamp) backward compat after a defined migration period — enforce `signed_at` as required | F-3 | | 6 | Reduce `commandMaxAge` from 24h to a tighter window (4h) once retry infrastructure is fixed | F-4 |