Complete RedFlag codebase with two major security audit implementations.
== A-1: Ed25519 Key Rotation Support ==
Server:
- SignCommand sets SignedAt timestamp and KeyID on every signature
- signing_keys database table (migration 020) for multi-key rotation
- InitializePrimaryKey registers active key at startup
- /api/v1/public-keys endpoint for rotation-aware agents
- SigningKeyQueries for key lifecycle management
Agent:
- Key-ID-aware verification via CheckKeyRotation
- FetchAndCacheAllActiveKeys for rotation pre-caching
- Cache metadata with TTL and staleness fallback
- SecurityLogger events for key rotation and command signing
== A-2: Replay Attack Fixes (F-1 through F-7) ==
F-5 CRITICAL - RetryCommand now signs via signAndCreateCommand
F-1 HIGH - v3 format: "{agent_id}:{cmd_id}:{type}:{hash}:{ts}"
F-7 HIGH - Migration 026: expires_at column with partial index
F-6 HIGH - GetPendingCommands/GetStuckCommands filter by expires_at
F-2 HIGH - Agent-side executedIDs dedup map with cleanup
F-4 HIGH - commandMaxAge reduced from 24h to 4h
F-3 CRITICAL - Old-format commands rejected after 48h via CreatedAt
Verification fixes: migration idempotency (ETHOS #4), log format
compliance (ETHOS #1), stale comments updated.
All 24 tests passing. Docker --no-cache build verified.
See docs/ for full audit reports and deviation log (DEV-001 to DEV-019).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
268 lines
14 KiB
Markdown
268 lines
14 KiB
Markdown
# A-2 Command Replay Attack Audit
|
|
**Date**: 2026-03-28
|
|
**Branch**: unstabledeveloper
|
|
**Scope**: Audit-only — no implementation changes
|
|
|
|
---
|
|
|
|
## 1. Signed Command Payload Analysis
|
|
|
|
### What fields are included in the signed message
|
|
|
|
**New format** (when `cmd.SignedAt != nil`):
|
|
```
|
|
{cmd.ID}:{cmd.CommandType}:{sha256(json(cmd.Params))}:{cmd.SignedAt.Unix()}
|
|
```
|
|
Source: `aggregator-server/internal/services/signing.go:361`, `aggregator-agent/internal/crypto/verification.go:71`
|
|
|
|
**Old format** (backward compat, when `cmd.SignedAt == nil`):
|
|
```
|
|
{cmd.ID}:{cmd.CommandType}:{sha256(json(cmd.Params))}
|
|
```
|
|
Source: `aggregator-agent/internal/crypto/verification.go:55`
|
|
|
|
### What is NOT in the signed payload
|
|
|
|
| Field | In signed payload? | Notes |
|
|
|---|---|---|
|
|
| `cmd.ID` (UUID) | YES | Unique per-command identifier |
|
|
| `cmd.CommandType` | YES | e.g. `install_updates`, `reboot` |
|
|
| `sha256(params)` | YES | Hash of full params JSON |
|
|
| `signed_at` timestamp | YES (new format only) | Unix seconds |
|
|
| `cmd.AgentID` | **NO** | Absent from signature |
|
|
| `cmd.Source` | **NO** | Absent from signature |
|
|
| `cmd.Status` | **NO** | Absent from signature |
|
|
| Nonce | **NO** | Not used in command signing |
|
|
|
|
**FINDING F-1 (HIGH)**: `agent_id` is not included in the signed payload. A valid signed command is not cryptographically bound to a specific agent. The only uniqueness guarantee is the command UUID — if an attacker could inject a captured command into a different agent's command queue, the signature would verify correctly.
|
|
|
|
---
|
|
|
|
## 2. Nonce Mechanism
|
|
|
|
### What the nonce looks like
|
|
|
|
The `SigningService` in `aggregator-server/internal/services/signing.go` has two nonce methods:
|
|
|
|
```go
|
|
func (s *SigningService) SignNonce(nonceUUID uuid.UUID, timestamp time.Time) (string, error)
|
|
func (s *SigningService) VerifyNonce(nonceUUID uuid.UUID, timestamp time.Time, signatureHex string, maxAge time.Duration) (bool, error)
|
|
```
|
|
|
|
Nonce format: `"{uuid}:{unix_timestamp}"` — signed with Ed25519.
|
|
|
|
### Where nonces are used
|
|
|
|
**Nonces are NOT used in command signing or command verification.**
|
|
|
|
The `SignNonce`/`VerifyNonce` methods exist exclusively for the agent update package flow (preventing replay of update download requests). They are completely disconnected from the command replay protection path.
|
|
|
|
The agent's `ProcessCommand` function (`command_handler.go:101`) calls `VerifyCommandWithTimestamp` or `VerifyCommand`. Neither of these checks any nonce. There is no nonce storage, no nonce tracking map, and no nonce field in `AgentCommand` or `CommandItem`.
|
|
|
|
**FINDING F-2 (CRITICAL)**: There is no nonce in the command signing path. The original issue comment ("nonce-only replay protection") is inaccurate in the opposite direction — there is no nonce AND no reliable replay protection for commands signed with the old format.
|
|
|
|
---
|
|
|
|
## 3. Verification Function Behaviour
|
|
|
|
### `VerifyCommand` (old format, no timestamp)
|
|
|
|
Source: `aggregator-agent/internal/crypto/verification.go:25`
|
|
|
|
Checks:
|
|
1. Signature field is non-empty
|
|
2. Signature is valid hex, correct length (64 bytes)
|
|
3. Ed25519 signature over `{id}:{type}:{sha256(params)}` verifies against public key
|
|
|
|
Returns: `error` (nil = pass). **No time check. No nonce check.**
|
|
|
|
**FINDING F-3 (CRITICAL)**: Commands signed with the old format (no `signed_at`) are valid indefinitely. A captured signature can be replayed at any time in the future — there is no expiry mechanism for old-format commands.
|
|
|
|
### `VerifyCommandWithTimestamp` (new format)
|
|
|
|
Source: `aggregator-agent/internal/crypto/verification.go:85`
|
|
|
|
Checks:
|
|
1. If `cmd.SignedAt == nil` → falls back to `VerifyCommand()` (see F-3)
|
|
2. `age = now.Sub(*cmd.SignedAt)` must satisfy: `age <= 24h` AND `age >= -5min`
|
|
3. Signature valid over `{id}:{type}:{sha256(params)}:{unix_timestamp}`
|
|
|
|
**FINDING F-4 (HIGH)**: 24-hour replay window. A captured signed command remains valid for replay for up to 24 hours from signing time. This is the default value of `commandMaxAge = 24 * time.Hour` defined in `command_handler.go:21`.
|
|
|
|
---
|
|
|
|
## 4. Command Creation Flow
|
|
|
|
### Full path: Dashboard approves install → command signed → stored
|
|
|
|
```
|
|
POST /updates/:id/install
|
|
→ UpdateHandler.InstallUpdate() [handlers/updates.go:459]
|
|
→ models.AgentCommand{...} [no signing yet]
|
|
→ h.agentHandler.signAndCreateCommand(cmd) [agents.go:49]
|
|
→ signingService.SignCommand(cmd) [services/signing.go:345]
|
|
→ cmd.SignedAt = &now [side-effect]
|
|
→ cmd.KeyID = GetCurrentKeyID() [side-effect]
|
|
→ message = "{id}:{type}:{hash}:{ts}"
|
|
→ ed25519.Sign(privateKey, message)
|
|
→ returns hex signature
|
|
→ cmd.Signature = signature
|
|
→ commandQueries.CreateCommand(cmd) [queries/commands.go:22]
|
|
→ INSERT INTO agent_commands (... key_id, signed_at ...)
|
|
```
|
|
|
|
The `ConfirmDependencies` and `ReportDependencies` (auto-install) handlers follow identical paths through `signAndCreateCommand`.
|
|
|
|
### RetryCommand path (DOES NOT RE-SIGN)
|
|
|
|
```
|
|
POST /commands/:id/retry
|
|
→ UpdateHandler.RetryCommand() [handlers/updates.go:779]
|
|
→ commandQueries.RetryCommand(id) [queries/commands.go:189]
|
|
→ newCommand = AgentCommand{ [copies Params, new UUID]
|
|
Signature: "", [EMPTY — not re-signed]
|
|
SignedAt: nil, [nil — no timestamp]
|
|
KeyID: "", [empty — no key reference]
|
|
}
|
|
→ q.CreateCommand(newCommand) [stored unsigned]
|
|
```
|
|
|
|
**FINDING F-5 (CRITICAL)**: `RetryCommand` creates a new command without calling `signAndCreateCommand`. The retried command has `Signature = ""`, `SignedAt = nil`, `KeyID = ""`. In strict enforcement mode, the agent rejects any command with an empty signature. This means **the retry feature is entirely broken when command signing is enabled in strict mode**. The HTTP handler in `updates.go:779` returns 200 OK and the command is stored in the DB, but the agent will reject it every time it polls.
|
|
|
|
---
|
|
|
|
## 5. Agent Command Fetch and Execution Flow
|
|
|
|
### Full path: Agent polls → receives commands → verifies → executes
|
|
|
|
```
|
|
GET /api/v1/agents/{id}/commands
|
|
→ AgentHandler.GetCommands() [handlers/agents.go:204]
|
|
→ commandQueries.GetPendingCommands(agentID) [status = 'pending' only]
|
|
→ commandQueries.GetStuckCommands(agentID, 5m) [sent > 5 min, not completed]
|
|
→ allCommands = pending + stuck
|
|
→ for each cmd: MarkCommandSent(cmd.ID) [transitions pending → sent]
|
|
→ returns CommandItem{ID, Type, Params, Signature, KeyID, SignedAt}
|
|
```
|
|
|
|
Agent-side:
|
|
```
|
|
main.go:875: apiClient.GetCommands(cfg.AgentID, metrics)
|
|
main.go:928: for _, cmd := range commands {
|
|
main.go:932: commandHandler.ProcessCommand(cmd, cfg, cfg.AgentID)
|
|
main.go:954: switch cmd.Type { ... execute ... }
|
|
```
|
|
|
|
### What `GetPendingCommands` returns
|
|
|
|
```sql
|
|
SELECT * FROM agent_commands
|
|
WHERE agent_id = $1 AND status = 'pending'
|
|
ORDER BY created_at ASC
|
|
LIMIT 100
|
|
```
|
|
|
|
There is no `WHERE created_at > NOW() - INTERVAL '24 hours'` filter. A command created 30 days ago with status `pending` (e.g., if it was never successfully sent) would be returned. If it has the old-format signature (no `signed_at`), the agent would execute it with no time check.
|
|
|
|
**FINDING F-6 (HIGH)**: The server-side command queue has no TTL filter. Old pending commands are delivered indefinitely. Combined with old-format signing (F-3), this means commands can persist in the queue and be executed arbitrarily long after creation.
|
|
|
|
---
|
|
|
|
## 6. Database Schema — TTL and Command Expiry
|
|
|
|
### agent_commands table (from migration 001 + amendments)
|
|
|
|
```sql
|
|
CREATE TABLE agent_commands (
|
|
id UUID PRIMARY KEY,
|
|
agent_id UUID REFERENCES agents(id),
|
|
command_type VARCHAR(50),
|
|
params JSONB,
|
|
status VARCHAR(20) DEFAULT 'pending',
|
|
created_at TIMESTAMP DEFAULT NOW(),
|
|
sent_at TIMESTAMP,
|
|
completed_at TIMESTAMP,
|
|
result JSONB,
|
|
signature VARCHAR(128), -- migration 020
|
|
key_id VARCHAR(64), -- migration 025
|
|
signed_at TIMESTAMP, -- migration 025
|
|
idempotency_key VARCHAR(64) UNIQUE -- migration 023a
|
|
);
|
|
```
|
|
|
|
**FINDING F-7 (HIGH)**: No `expires_at` column exists. No TTL constraint exists. No scheduled cleanup job for old pending commands exists in the codebase. The only cleanup mechanisms are:
|
|
|
|
- Manual `ClearOldFailedCommands(days)` — applies to `failed`/`timed_out` only, not `pending`
|
|
- Manual `CancelCommand(id)` — single-command manual cancellation
|
|
- The deduplication index from migration 023a prevents duplicate pending commands per `(agent_id, command_type)`, but this only prevents new duplicates — it doesn't expire old ones
|
|
|
|
---
|
|
|
|
## 7. Attack Surface Assessment
|
|
|
|
### Can a captured signed command be replayed indefinitely?
|
|
|
|
**New format (with `signed_at`)**: Replayable for 24 hours from signing time. After that, `VerifyCommandWithTimestamp` rejects it as too old.
|
|
|
|
**Old format (no `signed_at`)**: **YES — replayable indefinitely.** `VerifyCommand` has no time check. Any command signed before the A-1 implementation was deployed (before `signed_at` was added) is permanently replayable.
|
|
|
|
The backward-compatibility fallback in `VerifyCommandWithTimestamp` (`if cmd.SignedAt == nil → VerifyCommand`) means new servers talking to old agents, or commands in the DB pre-dating migration 025, all fall into the unlimited-replay category.
|
|
|
|
### Replay attack scenarios
|
|
|
|
**Scenario A — Network MITM (24h window)**
|
|
An attacker positioned between server and agent captures a valid `install_updates` command with `signed_at` set. Within 24 hours, they can re-present this command to the agent. If the agent's command handler receives it (via MITM on the polling response), it passes `VerifyCommandWithTimestamp` and is executed — potentially installing the same update a second time, or more dangerously triggering a `reboot` or `update_agent` command twice.
|
|
|
|
**Scenario B — Old-format signature captured forever**
|
|
Any command signed before `signed_at` support was deployed (old server version or commands created before migration 025 ran) has no timestamp. A captured signature is valid forever. The only defense is that the command UUID must match, but if an attacker can inject a command with a matching UUID into the DB, verification passes.
|
|
|
|
**Scenario C — Retry creates unsigned commands (strict mode)**
|
|
An operator clicks "Retry" on a failed `install_updates` command. The server creates a new unsigned command. In strict mode, the agent rejects it silently (logs the rejection, reports `failed` to the server). The operator may not understand why the retry keeps failing, and may downgrade the enforcement mode to `warning` as a workaround — which is exactly the wrong response.
|
|
|
|
**Scenario D — `agent_id` not in signature (cross-agent injection)**
|
|
If an attacker can write to the `agent_commands` table directly (e.g., via SQL injection elsewhere, or compromised server credentials), they can copy a signed command for agent A into agent B's queue. The Ed25519 signature will verify correctly on agent B because `agent_id` is not in the signed content.
|
|
|
|
**Scenario E — Stuck command re-execution**
|
|
The `GetStuckCommands` query re-delivers commands that are in `sent` status for > 5 minutes. If a command was genuinely stuck (network failure, agent restart), it may be re-executed when the agent comes back online. If the command is `reboot` or `install_updates`, this can cause unintended repeated execution. There is no duplicate-execution guard on the agent side (no "already executed command ID" tracking).
|
|
|
|
---
|
|
|
|
## 8. Summary Table
|
|
|
|
| Finding | Severity | Description |
|
|
|---------|----------|-------------|
|
|
| **F-1** | HIGH | `agent_id` absent from signed payload — commands not cryptographically bound to a specific agent |
|
|
| **F-2** | CRITICAL | No nonce in command signing path — no single-use guarantee for command signatures |
|
|
| **F-3** | CRITICAL | Old-format commands (no `signed_at`) have zero time-based replay protection — valid forever |
|
|
| **F-4** | HIGH | 24-hour replay window for new-format commands — adequate for most attacks but generous |
|
|
| **F-5** | CRITICAL | `RetryCommand` creates unsigned commands — entire retry feature broken in strict enforcement mode |
|
|
| **F-6** | HIGH | Server `GetPendingCommands` has no TTL filter — stale pending commands delivered indefinitely |
|
|
| **F-7** | HIGH | No `expires_at` column in `agent_commands` — no schema-enforced command TTL |
|
|
|
|
### Severity definitions used
|
|
- **CRITICAL**: Exploitable by an attacker with no special access, or breaks core security feature silently
|
|
- **HIGH**: Requires attacker to have partial access (MITM position, DB access) or silently degrades security posture
|
|
|
|
---
|
|
|
|
## 9. Out of Scope / Confirmed Clean
|
|
|
|
- The Ed25519 signing algorithm itself is correctly implemented (A-1 verified).
|
|
- The key rotation implementation (A-1) correctly identifies and uses the right public key per command.
|
|
- The timestamp arithmetic in `VerifyCommandWithTimestamp` is not inverted (verified in A-1).
|
|
- The JWT authentication on `GET /agents/:id/commands` is enforced by middleware — an unauthenticated attacker cannot directly call the command endpoint to inject commands through the server API.
|
|
- The deduplication index (migration 023a) prevents duplicate `pending` commands of the same type per agent.
|
|
|
|
---
|
|
|
|
## 10. Recommended Fixes (Prioritised, Not Yet Implemented)
|
|
|
|
| Priority | Fix | Addresses |
|
|
|----------|-----|-----------|
|
|
| 1 | Re-sign commands in `RetryCommand` — call `signAndCreateCommand` instead of `commandQueries.CreateCommand` directly | F-5 |
|
|
| 2 | Add `agent_id` to the signed message payload | F-1 |
|
|
| 3 | Add server-side command TTL: `expires_at` column + filter in `GetPendingCommands` | F-6, F-7 |
|
|
| 4 | Add agent-side executed-command deduplication: an in-memory or on-disk set of recently executed command UUIDs | F-2 (partial), F-4 |
|
|
| 5 | Remove old-format (no-timestamp) backward compat after a defined migration period — enforce `signed_at` as required | F-3 |
|
|
| 6 | Reduce `commandMaxAge` from 24h to a tighter window (4h) once retry infrastructure is fixed | F-4 |
|