Files

jpetree331 f97d4845af feat(security): A-1 Ed25519 key rotation + A-2 replay attack fixes

Complete RedFlag codebase with two major security audit implementations.

== A-1: Ed25519 Key Rotation Support ==

Server:
- SignCommand sets SignedAt timestamp and KeyID on every signature
- signing_keys database table (migration 020) for multi-key rotation
- InitializePrimaryKey registers active key at startup
- /api/v1/public-keys endpoint for rotation-aware agents
- SigningKeyQueries for key lifecycle management

Agent:
- Key-ID-aware verification via CheckKeyRotation
- FetchAndCacheAllActiveKeys for rotation pre-caching
- Cache metadata with TTL and staleness fallback
- SecurityLogger events for key rotation and command signing

== A-2: Replay Attack Fixes (F-1 through F-7) ==

F-5 CRITICAL - RetryCommand now signs via signAndCreateCommand
F-1 HIGH     - v3 format: "{agent_id}:{cmd_id}:{type}:{hash}:{ts}"
F-7 HIGH     - Migration 026: expires_at column with partial index
F-6 HIGH     - GetPendingCommands/GetStuckCommands filter by expires_at
F-2 HIGH     - Agent-side executedIDs dedup map with cleanup
F-4 HIGH     - commandMaxAge reduced from 24h to 4h
F-3 CRITICAL - Old-format commands rejected after 48h via CreatedAt

Verification fixes: migration idempotency (ETHOS #4), log format
compliance (ETHOS #1), stale comments updated.

All 24 tests passing. Docker --no-cache build verified.
See docs/ for full audit reports and deviation log (DEV-001 to DEV-019).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-28 21:25:47 -04:00

14 KiB

Raw Blame History

A-2 Command Replay Attack Audit

Date: 2026-03-28 Branch: unstabledeveloper Scope: Audit-only — no implementation changes

1. Signed Command Payload Analysis

What fields are included in the signed message

New format (when cmd.SignedAt != nil):

{cmd.ID}:{cmd.CommandType}:{sha256(json(cmd.Params))}:{cmd.SignedAt.Unix()}

Source: aggregator-server/internal/services/signing.go:361, aggregator-agent/internal/crypto/verification.go:71

Old format (backward compat, when cmd.SignedAt == nil):

{cmd.ID}:{cmd.CommandType}:{sha256(json(cmd.Params))}

Source: aggregator-agent/internal/crypto/verification.go:55

What is NOT in the signed payload

Field	In signed payload?	Notes
`cmd.ID` (UUID)	YES	Unique per-command identifier
`cmd.CommandType`	YES	e.g. `install_updates`, `reboot`
`sha256(params)`	YES	Hash of full params JSON
`signed_at` timestamp	YES (new format only)	Unix seconds
`cmd.AgentID`	NO	Absent from signature
`cmd.Source`	NO	Absent from signature
`cmd.Status`	NO	Absent from signature
Nonce	NO	Not used in command signing

FINDING F-1 (HIGH): agent_id is not included in the signed payload. A valid signed command is not cryptographically bound to a specific agent. The only uniqueness guarantee is the command UUID — if an attacker could inject a captured command into a different agent's command queue, the signature would verify correctly.

2. Nonce Mechanism

What the nonce looks like

The SigningService in aggregator-server/internal/services/signing.go has two nonce methods:

func (s *SigningService) SignNonce(nonceUUID uuid.UUID, timestamp time.Time) (string, error)
func (s *SigningService) VerifyNonce(nonceUUID uuid.UUID, timestamp time.Time, signatureHex string, maxAge time.Duration) (bool, error)

Nonce format: "{uuid}:{unix_timestamp}" — signed with Ed25519.

Where nonces are used

Nonces are NOT used in command signing or command verification.

The SignNonce/VerifyNonce methods exist exclusively for the agent update package flow (preventing replay of update download requests). They are completely disconnected from the command replay protection path.

The agent's ProcessCommand function (command_handler.go:101) calls VerifyCommandWithTimestamp or VerifyCommand. Neither of these checks any nonce. There is no nonce storage, no nonce tracking map, and no nonce field in AgentCommand or CommandItem.

FINDING F-2 (CRITICAL): There is no nonce in the command signing path. The original issue comment ("nonce-only replay protection") is inaccurate in the opposite direction — there is no nonce AND no reliable replay protection for commands signed with the old format.

3. Verification Function Behaviour

`VerifyCommand` (old format, no timestamp)

Source: aggregator-agent/internal/crypto/verification.go:25

Checks:

Signature field is non-empty
Signature is valid hex, correct length (64 bytes)
Ed25519 signature over {id}:{type}:{sha256(params)} verifies against public key

Returns: error (nil = pass). No time check. No nonce check.

FINDING F-3 (CRITICAL): Commands signed with the old format (no signed_at) are valid indefinitely. A captured signature can be replayed at any time in the future — there is no expiry mechanism for old-format commands.

`VerifyCommandWithTimestamp` (new format)

Source: aggregator-agent/internal/crypto/verification.go:85

Checks:

If cmd.SignedAt == nil → falls back to VerifyCommand() (see F-3)
age = now.Sub(*cmd.SignedAt) must satisfy: age <= 24h AND age >= -5min
Signature valid over {id}:{type}:{sha256(params)}:{unix_timestamp}

FINDING F-4 (HIGH): 24-hour replay window. A captured signed command remains valid for replay for up to 24 hours from signing time. This is the default value of commandMaxAge = 24 * time.Hour defined in command_handler.go:21.

4. Command Creation Flow

Full path: Dashboard approves install → command signed → stored

POST /updates/:id/install
  → UpdateHandler.InstallUpdate()                    [handlers/updates.go:459]
      → models.AgentCommand{...}                     [no signing yet]
      → h.agentHandler.signAndCreateCommand(cmd)     [agents.go:49]
          → signingService.SignCommand(cmd)           [services/signing.go:345]
              → cmd.SignedAt = &now                  [side-effect]
              → cmd.KeyID = GetCurrentKeyID()        [side-effect]
              → message = "{id}:{type}:{hash}:{ts}"
              → ed25519.Sign(privateKey, message)
              → returns hex signature
          → cmd.Signature = signature
          → commandQueries.CreateCommand(cmd)        [queries/commands.go:22]
              → INSERT INTO agent_commands (... key_id, signed_at ...)

The ConfirmDependencies and ReportDependencies (auto-install) handlers follow identical paths through signAndCreateCommand.

RetryCommand path (DOES NOT RE-SIGN)

POST /commands/:id/retry
  → UpdateHandler.RetryCommand()                     [handlers/updates.go:779]
      → commandQueries.RetryCommand(id)              [queries/commands.go:189]
          → newCommand = AgentCommand{               [copies Params, new UUID]
                Signature: "",                       [EMPTY — not re-signed]
                SignedAt: nil,                       [nil — no timestamp]
                KeyID: "",                           [empty — no key reference]
            }
          → q.CreateCommand(newCommand)              [stored unsigned]

FINDING F-5 (CRITICAL): RetryCommand creates a new command without calling signAndCreateCommand. The retried command has Signature = "", SignedAt = nil, KeyID = "". In strict enforcement mode, the agent rejects any command with an empty signature. This means the retry feature is entirely broken when command signing is enabled in strict mode. The HTTP handler in updates.go:779 returns 200 OK and the command is stored in the DB, but the agent will reject it every time it polls.

5. Agent Command Fetch and Execution Flow

Full path: Agent polls → receives commands → verifies → executes

GET /api/v1/agents/{id}/commands
  → AgentHandler.GetCommands()                       [handlers/agents.go:204]
      → commandQueries.GetPendingCommands(agentID)   [status = 'pending' only]
      → commandQueries.GetStuckCommands(agentID, 5m) [sent > 5 min, not completed]
      → allCommands = pending + stuck
      → for each cmd: MarkCommandSent(cmd.ID)        [transitions pending → sent]
      → returns CommandItem{ID, Type, Params, Signature, KeyID, SignedAt}

Agent-side:

main.go:875: apiClient.GetCommands(cfg.AgentID, metrics)
main.go:928: for _, cmd := range commands {
main.go:932:     commandHandler.ProcessCommand(cmd, cfg, cfg.AgentID)
main.go:954:     switch cmd.Type { ... execute ... }

What `GetPendingCommands` returns

SELECT * FROM agent_commands
WHERE agent_id = $1 AND status = 'pending'
ORDER BY created_at ASC
LIMIT 100

There is no WHERE created_at > NOW() - INTERVAL '24 hours' filter. A command created 30 days ago with status pending (e.g., if it was never successfully sent) would be returned. If it has the old-format signature (no signed_at), the agent would execute it with no time check.

FINDING F-6 (HIGH): The server-side command queue has no TTL filter. Old pending commands are delivered indefinitely. Combined with old-format signing (F-3), this means commands can persist in the queue and be executed arbitrarily long after creation.

6. Database Schema — TTL and Command Expiry

agent_commands table (from migration 001 + amendments)

CREATE TABLE agent_commands (
    id          UUID PRIMARY KEY,
    agent_id    UUID REFERENCES agents(id),
    command_type VARCHAR(50),
    params      JSONB,
    status      VARCHAR(20) DEFAULT 'pending',
    created_at  TIMESTAMP DEFAULT NOW(),
    sent_at     TIMESTAMP,
    completed_at TIMESTAMP,
    result      JSONB,
    signature   VARCHAR(128),          -- migration 020
    key_id      VARCHAR(64),           -- migration 025
    signed_at   TIMESTAMP,             -- migration 025
    idempotency_key VARCHAR(64) UNIQUE -- migration 023a
);

FINDING F-7 (HIGH): No expires_at column exists. No TTL constraint exists. No scheduled cleanup job for old pending commands exists in the codebase. The only cleanup mechanisms are:

Manual ClearOldFailedCommands(days) — applies to failed/timed_out only, not pending
Manual CancelCommand(id) — single-command manual cancellation
The deduplication index from migration 023a prevents duplicate pending commands per (agent_id, command_type), but this only prevents new duplicates — it doesn't expire old ones

7. Attack Surface Assessment

Can a captured signed command be replayed indefinitely?

New format (with signed_at): Replayable for 24 hours from signing time. After that, VerifyCommandWithTimestamp rejects it as too old.

Old format (no signed_at): YES — replayable indefinitely. VerifyCommand has no time check. Any command signed before the A-1 implementation was deployed (before signed_at was added) is permanently replayable.

The backward-compatibility fallback in VerifyCommandWithTimestamp (if cmd.SignedAt == nil → VerifyCommand) means new servers talking to old agents, or commands in the DB pre-dating migration 025, all fall into the unlimited-replay category.

Replay attack scenarios

Scenario A — Network MITM (24h window) An attacker positioned between server and agent captures a valid install_updates command with signed_at set. Within 24 hours, they can re-present this command to the agent. If the agent's command handler receives it (via MITM on the polling response), it passes VerifyCommandWithTimestamp and is executed — potentially installing the same update a second time, or more dangerously triggering a reboot or update_agent command twice.

Scenario B — Old-format signature captured forever Any command signed before signed_at support was deployed (old server version or commands created before migration 025 ran) has no timestamp. A captured signature is valid forever. The only defense is that the command UUID must match, but if an attacker can inject a command with a matching UUID into the DB, verification passes.

Scenario C — Retry creates unsigned commands (strict mode) An operator clicks "Retry" on a failed install_updates command. The server creates a new unsigned command. In strict mode, the agent rejects it silently (logs the rejection, reports failed to the server). The operator may not understand why the retry keeps failing, and may downgrade the enforcement mode to warning as a workaround — which is exactly the wrong response.

Scenario D — agent_id not in signature (cross-agent injection) If an attacker can write to the agent_commands table directly (e.g., via SQL injection elsewhere, or compromised server credentials), they can copy a signed command for agent A into agent B's queue. The Ed25519 signature will verify correctly on agent B because agent_id is not in the signed content.

Scenario E — Stuck command re-execution The GetStuckCommands query re-delivers commands that are in sent status for > 5 minutes. If a command was genuinely stuck (network failure, agent restart), it may be re-executed when the agent comes back online. If the command is reboot or install_updates, this can cause unintended repeated execution. There is no duplicate-execution guard on the agent side (no "already executed command ID" tracking).

8. Summary Table

Finding	Severity	Description
F-1	HIGH	`agent_id` absent from signed payload — commands not cryptographically bound to a specific agent
F-2	CRITICAL	No nonce in command signing path — no single-use guarantee for command signatures
F-3	CRITICAL	Old-format commands (no `signed_at`) have zero time-based replay protection — valid forever
F-4	HIGH	24-hour replay window for new-format commands — adequate for most attacks but generous
F-5	CRITICAL	`RetryCommand` creates unsigned commands — entire retry feature broken in strict enforcement mode
F-6	HIGH	Server `GetPendingCommands` has no TTL filter — stale pending commands delivered indefinitely
F-7	HIGH	No `expires_at` column in `agent_commands` — no schema-enforced command TTL

Severity definitions used

CRITICAL: Exploitable by an attacker with no special access, or breaks core security feature silently
HIGH: Requires attacker to have partial access (MITM position, DB access) or silently degrades security posture

9. Out of Scope / Confirmed Clean

The Ed25519 signing algorithm itself is correctly implemented (A-1 verified).
The key rotation implementation (A-1) correctly identifies and uses the right public key per command.
The timestamp arithmetic in VerifyCommandWithTimestamp is not inverted (verified in A-1).
The JWT authentication on GET /agents/:id/commands is enforced by middleware — an unauthenticated attacker cannot directly call the command endpoint to inject commands through the server API.
The deduplication index (migration 023a) prevents duplicate pending commands of the same type per agent.

10. Recommended Fixes (Prioritised, Not Yet Implemented)

Priority	Fix	Addresses
1	Re-sign commands in `RetryCommand` — call `signAndCreateCommand` instead of `commandQueries.CreateCommand` directly	F-5
2	Add `agent_id` to the signed message payload	F-1
3	Add server-side command TTL: `expires_at` column + filter in `GetPendingCommands`	F-6, F-7
4	Add agent-side executed-command deduplication: an in-memory or on-disk set of recently executed command UUIDs	F-2 (partial), F-4
5	Remove old-format (no-timestamp) backward compat after a defined migration period — enforce `signed_at` as required	F-3
6	Reduce `commandMaxAge` from 24h to a tighter window (4h) once retry infrastructure is fixed	F-4

14 KiB Raw Blame History