Complete RedFlag codebase with two major security audit implementations.
== A-1: Ed25519 Key Rotation Support ==
Server:
- SignCommand sets SignedAt timestamp and KeyID on every signature
- signing_keys database table (migration 020) for multi-key rotation
- InitializePrimaryKey registers active key at startup
- /api/v1/public-keys endpoint for rotation-aware agents
- SigningKeyQueries for key lifecycle management
Agent:
- Key-ID-aware verification via CheckKeyRotation
- FetchAndCacheAllActiveKeys for rotation pre-caching
- Cache metadata with TTL and staleness fallback
- SecurityLogger events for key rotation and command signing
== A-2: Replay Attack Fixes (F-1 through F-7) ==
F-5 CRITICAL - RetryCommand now signs via signAndCreateCommand
F-1 HIGH - v3 format: "{agent_id}:{cmd_id}:{type}:{hash}:{ts}"
F-7 HIGH - Migration 026: expires_at column with partial index
F-6 HIGH - GetPendingCommands/GetStuckCommands filter by expires_at
F-2 HIGH - Agent-side executedIDs dedup map with cleanup
F-4 HIGH - commandMaxAge reduced from 24h to 4h
F-3 CRITICAL - Old-format commands rejected after 48h via CreatedAt
Verification fixes: migration idempotency (ETHOS #4), log format
compliance (ETHOS #1), stale comments updated.
All 24 tests passing. Docker --no-cache build verified.
See docs/ for full audit reports and deviation log (DEV-001 to DEV-019).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
14 KiB
A-2 Command Replay Attack Audit
Date: 2026-03-28 Branch: unstabledeveloper Scope: Audit-only — no implementation changes
1. Signed Command Payload Analysis
What fields are included in the signed message
New format (when cmd.SignedAt != nil):
{cmd.ID}:{cmd.CommandType}:{sha256(json(cmd.Params))}:{cmd.SignedAt.Unix()}
Source: aggregator-server/internal/services/signing.go:361, aggregator-agent/internal/crypto/verification.go:71
Old format (backward compat, when cmd.SignedAt == nil):
{cmd.ID}:{cmd.CommandType}:{sha256(json(cmd.Params))}
Source: aggregator-agent/internal/crypto/verification.go:55
What is NOT in the signed payload
| Field | In signed payload? | Notes |
|---|---|---|
cmd.ID (UUID) |
YES | Unique per-command identifier |
cmd.CommandType |
YES | e.g. install_updates, reboot |
sha256(params) |
YES | Hash of full params JSON |
signed_at timestamp |
YES (new format only) | Unix seconds |
cmd.AgentID |
NO | Absent from signature |
cmd.Source |
NO | Absent from signature |
cmd.Status |
NO | Absent from signature |
| Nonce | NO | Not used in command signing |
FINDING F-1 (HIGH): agent_id is not included in the signed payload. A valid signed command is not cryptographically bound to a specific agent. The only uniqueness guarantee is the command UUID — if an attacker could inject a captured command into a different agent's command queue, the signature would verify correctly.
2. Nonce Mechanism
What the nonce looks like
The SigningService in aggregator-server/internal/services/signing.go has two nonce methods:
func (s *SigningService) SignNonce(nonceUUID uuid.UUID, timestamp time.Time) (string, error)
func (s *SigningService) VerifyNonce(nonceUUID uuid.UUID, timestamp time.Time, signatureHex string, maxAge time.Duration) (bool, error)
Nonce format: "{uuid}:{unix_timestamp}" — signed with Ed25519.
Where nonces are used
Nonces are NOT used in command signing or command verification.
The SignNonce/VerifyNonce methods exist exclusively for the agent update package flow (preventing replay of update download requests). They are completely disconnected from the command replay protection path.
The agent's ProcessCommand function (command_handler.go:101) calls VerifyCommandWithTimestamp or VerifyCommand. Neither of these checks any nonce. There is no nonce storage, no nonce tracking map, and no nonce field in AgentCommand or CommandItem.
FINDING F-2 (CRITICAL): There is no nonce in the command signing path. The original issue comment ("nonce-only replay protection") is inaccurate in the opposite direction — there is no nonce AND no reliable replay protection for commands signed with the old format.
3. Verification Function Behaviour
VerifyCommand (old format, no timestamp)
Source: aggregator-agent/internal/crypto/verification.go:25
Checks:
- Signature field is non-empty
- Signature is valid hex, correct length (64 bytes)
- Ed25519 signature over
{id}:{type}:{sha256(params)}verifies against public key
Returns: error (nil = pass). No time check. No nonce check.
FINDING F-3 (CRITICAL): Commands signed with the old format (no signed_at) are valid indefinitely. A captured signature can be replayed at any time in the future — there is no expiry mechanism for old-format commands.
VerifyCommandWithTimestamp (new format)
Source: aggregator-agent/internal/crypto/verification.go:85
Checks:
- If
cmd.SignedAt == nil→ falls back toVerifyCommand()(see F-3) age = now.Sub(*cmd.SignedAt)must satisfy:age <= 24hANDage >= -5min- Signature valid over
{id}:{type}:{sha256(params)}:{unix_timestamp}
FINDING F-4 (HIGH): 24-hour replay window. A captured signed command remains valid for replay for up to 24 hours from signing time. This is the default value of commandMaxAge = 24 * time.Hour defined in command_handler.go:21.
4. Command Creation Flow
Full path: Dashboard approves install → command signed → stored
POST /updates/:id/install
→ UpdateHandler.InstallUpdate() [handlers/updates.go:459]
→ models.AgentCommand{...} [no signing yet]
→ h.agentHandler.signAndCreateCommand(cmd) [agents.go:49]
→ signingService.SignCommand(cmd) [services/signing.go:345]
→ cmd.SignedAt = &now [side-effect]
→ cmd.KeyID = GetCurrentKeyID() [side-effect]
→ message = "{id}:{type}:{hash}:{ts}"
→ ed25519.Sign(privateKey, message)
→ returns hex signature
→ cmd.Signature = signature
→ commandQueries.CreateCommand(cmd) [queries/commands.go:22]
→ INSERT INTO agent_commands (... key_id, signed_at ...)
The ConfirmDependencies and ReportDependencies (auto-install) handlers follow identical paths through signAndCreateCommand.
RetryCommand path (DOES NOT RE-SIGN)
POST /commands/:id/retry
→ UpdateHandler.RetryCommand() [handlers/updates.go:779]
→ commandQueries.RetryCommand(id) [queries/commands.go:189]
→ newCommand = AgentCommand{ [copies Params, new UUID]
Signature: "", [EMPTY — not re-signed]
SignedAt: nil, [nil — no timestamp]
KeyID: "", [empty — no key reference]
}
→ q.CreateCommand(newCommand) [stored unsigned]
FINDING F-5 (CRITICAL): RetryCommand creates a new command without calling signAndCreateCommand. The retried command has Signature = "", SignedAt = nil, KeyID = "". In strict enforcement mode, the agent rejects any command with an empty signature. This means the retry feature is entirely broken when command signing is enabled in strict mode. The HTTP handler in updates.go:779 returns 200 OK and the command is stored in the DB, but the agent will reject it every time it polls.
5. Agent Command Fetch and Execution Flow
Full path: Agent polls → receives commands → verifies → executes
GET /api/v1/agents/{id}/commands
→ AgentHandler.GetCommands() [handlers/agents.go:204]
→ commandQueries.GetPendingCommands(agentID) [status = 'pending' only]
→ commandQueries.GetStuckCommands(agentID, 5m) [sent > 5 min, not completed]
→ allCommands = pending + stuck
→ for each cmd: MarkCommandSent(cmd.ID) [transitions pending → sent]
→ returns CommandItem{ID, Type, Params, Signature, KeyID, SignedAt}
Agent-side:
main.go:875: apiClient.GetCommands(cfg.AgentID, metrics)
main.go:928: for _, cmd := range commands {
main.go:932: commandHandler.ProcessCommand(cmd, cfg, cfg.AgentID)
main.go:954: switch cmd.Type { ... execute ... }
What GetPendingCommands returns
SELECT * FROM agent_commands
WHERE agent_id = $1 AND status = 'pending'
ORDER BY created_at ASC
LIMIT 100
There is no WHERE created_at > NOW() - INTERVAL '24 hours' filter. A command created 30 days ago with status pending (e.g., if it was never successfully sent) would be returned. If it has the old-format signature (no signed_at), the agent would execute it with no time check.
FINDING F-6 (HIGH): The server-side command queue has no TTL filter. Old pending commands are delivered indefinitely. Combined with old-format signing (F-3), this means commands can persist in the queue and be executed arbitrarily long after creation.
6. Database Schema — TTL and Command Expiry
agent_commands table (from migration 001 + amendments)
CREATE TABLE agent_commands (
id UUID PRIMARY KEY,
agent_id UUID REFERENCES agents(id),
command_type VARCHAR(50),
params JSONB,
status VARCHAR(20) DEFAULT 'pending',
created_at TIMESTAMP DEFAULT NOW(),
sent_at TIMESTAMP,
completed_at TIMESTAMP,
result JSONB,
signature VARCHAR(128), -- migration 020
key_id VARCHAR(64), -- migration 025
signed_at TIMESTAMP, -- migration 025
idempotency_key VARCHAR(64) UNIQUE -- migration 023a
);
FINDING F-7 (HIGH): No expires_at column exists. No TTL constraint exists. No scheduled cleanup job for old pending commands exists in the codebase. The only cleanup mechanisms are:
- Manual
ClearOldFailedCommands(days)— applies tofailed/timed_outonly, notpending - Manual
CancelCommand(id)— single-command manual cancellation - The deduplication index from migration 023a prevents duplicate pending commands per
(agent_id, command_type), but this only prevents new duplicates — it doesn't expire old ones
7. Attack Surface Assessment
Can a captured signed command be replayed indefinitely?
New format (with signed_at): Replayable for 24 hours from signing time. After that, VerifyCommandWithTimestamp rejects it as too old.
Old format (no signed_at): YES — replayable indefinitely. VerifyCommand has no time check. Any command signed before the A-1 implementation was deployed (before signed_at was added) is permanently replayable.
The backward-compatibility fallback in VerifyCommandWithTimestamp (if cmd.SignedAt == nil → VerifyCommand) means new servers talking to old agents, or commands in the DB pre-dating migration 025, all fall into the unlimited-replay category.
Replay attack scenarios
Scenario A — Network MITM (24h window)
An attacker positioned between server and agent captures a valid install_updates command with signed_at set. Within 24 hours, they can re-present this command to the agent. If the agent's command handler receives it (via MITM on the polling response), it passes VerifyCommandWithTimestamp and is executed — potentially installing the same update a second time, or more dangerously triggering a reboot or update_agent command twice.
Scenario B — Old-format signature captured forever
Any command signed before signed_at support was deployed (old server version or commands created before migration 025 ran) has no timestamp. A captured signature is valid forever. The only defense is that the command UUID must match, but if an attacker can inject a command with a matching UUID into the DB, verification passes.
Scenario C — Retry creates unsigned commands (strict mode)
An operator clicks "Retry" on a failed install_updates command. The server creates a new unsigned command. In strict mode, the agent rejects it silently (logs the rejection, reports failed to the server). The operator may not understand why the retry keeps failing, and may downgrade the enforcement mode to warning as a workaround — which is exactly the wrong response.
Scenario D — agent_id not in signature (cross-agent injection)
If an attacker can write to the agent_commands table directly (e.g., via SQL injection elsewhere, or compromised server credentials), they can copy a signed command for agent A into agent B's queue. The Ed25519 signature will verify correctly on agent B because agent_id is not in the signed content.
Scenario E — Stuck command re-execution
The GetStuckCommands query re-delivers commands that are in sent status for > 5 minutes. If a command was genuinely stuck (network failure, agent restart), it may be re-executed when the agent comes back online. If the command is reboot or install_updates, this can cause unintended repeated execution. There is no duplicate-execution guard on the agent side (no "already executed command ID" tracking).
8. Summary Table
| Finding | Severity | Description |
|---|---|---|
| F-1 | HIGH | agent_id absent from signed payload — commands not cryptographically bound to a specific agent |
| F-2 | CRITICAL | No nonce in command signing path — no single-use guarantee for command signatures |
| F-3 | CRITICAL | Old-format commands (no signed_at) have zero time-based replay protection — valid forever |
| F-4 | HIGH | 24-hour replay window for new-format commands — adequate for most attacks but generous |
| F-5 | CRITICAL | RetryCommand creates unsigned commands — entire retry feature broken in strict enforcement mode |
| F-6 | HIGH | Server GetPendingCommands has no TTL filter — stale pending commands delivered indefinitely |
| F-7 | HIGH | No expires_at column in agent_commands — no schema-enforced command TTL |
Severity definitions used
- CRITICAL: Exploitable by an attacker with no special access, or breaks core security feature silently
- HIGH: Requires attacker to have partial access (MITM position, DB access) or silently degrades security posture
9. Out of Scope / Confirmed Clean
- The Ed25519 signing algorithm itself is correctly implemented (A-1 verified).
- The key rotation implementation (A-1) correctly identifies and uses the right public key per command.
- The timestamp arithmetic in
VerifyCommandWithTimestampis not inverted (verified in A-1). - The JWT authentication on
GET /agents/:id/commandsis enforced by middleware — an unauthenticated attacker cannot directly call the command endpoint to inject commands through the server API. - The deduplication index (migration 023a) prevents duplicate
pendingcommands of the same type per agent.
10. Recommended Fixes (Prioritised, Not Yet Implemented)
| Priority | Fix | Addresses |
|---|---|---|
| 1 | Re-sign commands in RetryCommand — call signAndCreateCommand instead of commandQueries.CreateCommand directly |
F-5 |
| 2 | Add agent_id to the signed message payload |
F-1 |
| 3 | Add server-side command TTL: expires_at column + filter in GetPendingCommands |
F-6, F-7 |
| 4 | Add agent-side executed-command deduplication: an in-memory or on-disk set of recently executed command UUIDs | F-2 (partial), F-4 |
| 5 | Remove old-format (no-timestamp) backward compat after a defined migration period — enforce signed_at as required |
F-3 |
| 6 | Reduce commandMaxAge from 24h to a tighter window (4h) once retry infrastructure is fixed |
F-4 |