Files
Redflag/docs/A2_PreFix_Tests.md
jpetree331 f97d4845af feat(security): A-1 Ed25519 key rotation + A-2 replay attack fixes
Complete RedFlag codebase with two major security audit implementations.

== A-1: Ed25519 Key Rotation Support ==

Server:
- SignCommand sets SignedAt timestamp and KeyID on every signature
- signing_keys database table (migration 020) for multi-key rotation
- InitializePrimaryKey registers active key at startup
- /api/v1/public-keys endpoint for rotation-aware agents
- SigningKeyQueries for key lifecycle management

Agent:
- Key-ID-aware verification via CheckKeyRotation
- FetchAndCacheAllActiveKeys for rotation pre-caching
- Cache metadata with TTL and staleness fallback
- SecurityLogger events for key rotation and command signing

== A-2: Replay Attack Fixes (F-1 through F-7) ==

F-5 CRITICAL - RetryCommand now signs via signAndCreateCommand
F-1 HIGH     - v3 format: "{agent_id}:{cmd_id}:{type}:{hash}:{ts}"
F-7 HIGH     - Migration 026: expires_at column with partial index
F-6 HIGH     - GetPendingCommands/GetStuckCommands filter by expires_at
F-2 HIGH     - Agent-side executedIDs dedup map with cleanup
F-4 HIGH     - commandMaxAge reduced from 24h to 4h
F-3 CRITICAL - Old-format commands rejected after 48h via CreatedAt

Verification fixes: migration idempotency (ETHOS #4), log format
compliance (ETHOS #1), stale comments updated.

All 24 tests passing. Docker --no-cache build verified.
See docs/ for full audit reports and deviation log (DEV-001 to DEV-019).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 21:25:47 -04:00

15 KiB

A-2 Pre-Fix Test Suite

Date: 2026-03-28 Branch: unstabledeveloper Purpose: Document replay attack bugs BEFORE fixes are applied.

These tests prove that the bugs exist today and will prove the fixes work when applied. Do NOT modify these tests before the fix is ready — they are the regression baseline.


Test Files Created

File Package Bugs Documented
aggregator-server/internal/services/signing_replay_test.go services_test F-5, F-1, F-3
aggregator-agent/internal/crypto/replay_test.go crypto F-3, F-4, F-2, F-1
aggregator-server/internal/database/queries/commands_ttl_test.go queries_test F-6, F-7, F-5
aggregator-server/internal/api/handlers/retry_signing_test.go handlers_test F-5

How to Run

# Server-side tests (all pre-fix tests)
cd aggregator-server && go test ./internal/services/... -v -run "TestRetry|TestSigned|TestOld"
cd aggregator-server && go test ./internal/database/queries/... -v -run TestGetPending
cd aggregator-server && go test ./internal/api/handlers/... -v -run TestRetryCommand

# Agent-side tests (all pre-fix tests)
cd aggregator-agent && go test ./internal/crypto/... -v -run "TestOld|TestNew|TestSame|TestCross"

# Run everything with verbose output
cd aggregator-server && go test ./... -v 2>&1 | grep -E "(PASS|FAIL|BUG|---)"
cd aggregator-agent  && go test ./... -v 2>&1 | grep -E "(PASS|FAIL|BUG|---)"

Test Inventory

Behaviour Categories

PASS-NOW / FAIL-AFTER-FIX — Asserts the CURRENT (buggy) behaviour. The test passes because the bug exists. When the fix is applied, the behaviour changes and this test fails — signalling that the test itself needs to be updated to assert the new correct state.

FAIL-NOW / PASS-AFTER-FIX — Asserts the CORRECT post-fix behaviour. The test fails because the bug exists. When the fix is applied, the assertion becomes true and the test passes — proving the fix works.


File 1: aggregator-server/internal/services/signing_replay_test.go

TestRetryCommandIsUnsigned

  • Bug: F-5 — RetryCommand creates unsigned commands
  • What it asserts: retried.Signature == "", retried.SignedAt == nil, retried.KeyID == ""
  • Category: PASS-NOW / FAIL-AFTER-FIX
  • Why it currently passes: queries.RetryCommand (commands.go:189) builds a new AgentCommand struct without calling signAndCreateCommand. All three signature fields are zero values.
  • What changes after fix: RetryCommand will call signAndCreateCommand, so retried.Signature will be non-empty — the assertions flip to failures.
  • Operator impact: Until fixed, every "Retry" click in the dashboard creates an unsigned command. In strict enforcement mode the agent rejects it silently, logging "command verification failed: strict enforcement requires signed commands". The server returns HTTP 200 so the operator sees no error.

TestRetryCommandMustBeSigned

  • Bug: F-5 — RetryCommand creates unsigned commands
  • What it asserts: retried.Signature != "", retried.SignedAt != nil, retried.KeyID != ""
  • Category: FAIL-NOW / PASS-AFTER-FIX
  • Why it currently fails: The retry command is unsigned (bug F-5 exists).
  • What changes after fix: All three fields will be populated; test passes.

TestSignedCommandNotBoundToAgent

  • Bug: F-1 — agent_id absent from signed payload
  • What it asserts: agentA.String() is NOT in the signed message, and ed25519.Verify returns true for the command regardless of which agent receives it.
  • Category: PASS-NOW / FAIL-AFTER-FIX
  • Why it currently passes: Signed message is {id}:{type}:{sha256(params)}:{ts}. No agent_id component. ed25519.Verify ignores anything outside the signed message.
  • What changes after fix: When agent_id is added to the signed message, the message reconstructed in the test (without agent_id) will not match the signature — ed25519.Verify returns false and the test fails.
  • Attack scenario: An attacker with DB write access can copy a signed command from agent A into agent B's agent_commands queue. The signature passes verification on agent B.

TestOldFormatCommandHasNoExpiry

  • Bug: F-3 — Old-format commands (no signed_at) valid forever
  • What it asserts: ed25519.Verify returns true for an old-format signature (no timestamp in the message) regardless of when verification occurs.
  • Category: PASS-NOW / FAIL-AFTER-FIX
  • Why it currently passes: ed25519.Verify is a pure cryptographic check — it has no time component. The old format {id}:{type}:{sha256(params)} contains no timestamp, so there is nothing to expire.
  • What changes after fix: Either VerifyCommand is updated to reject old-format commands outright (requiring signed_at), or a created_at check is added — the test would then need to be updated to expect rejection.

File 2: aggregator-agent/internal/crypto/replay_test.go

Uses helpers generateKeyPair, signCommand, signCommandOld from verification_test.go (same package).

TestOldFormatReplayIsUnbounded

  • Bug: F-3 — VerifyCommand has no time check
  • What it asserts: v.VerifyCommand(cmd, pub) returns nil for a command with SignedAt == nil (old format), regardless of age.
  • Category: PASS-NOW / FAIL-AFTER-FIX
  • Why it currently passes: VerifyCommand (verification.go:25) performs only an Ed25519 signature check. No created_at or SignedAt field is examined.
  • What changes after fix: After adding an expiry check, VerifyCommand will return an error for old-format commands beyond a defined age, and this test will fail.

TestNewFormatCommandCanBeReplayedWithin24Hours

  • Bug: F-4 — 24-hour replay window (large but intentional)
  • What it asserts: VerifyCommandWithTimestamp returns nil for a command signed 23h59m ago (within the 24h commandMaxAge).
  • Category: PASS-NOW / WILL-REMAIN-PASSING until commandMaxAge is reduced
  • Why it currently passes: By design — the 24h window is intentional to accommodate polling intervals and network delays.
  • What changes after fix: If commandMaxAge is reduced (e.g. to 4h per the A-2 audit recommendation), this test will FAIL for commands older than the new limit. Update the time.Duration in the test when commandMaxAge is changed.

TestSameCommandCanBeVerifiedTwice

  • Bug: F-2 — No nonce; same command verifies any number of times
  • What it asserts: VerifyCommandWithTimestamp returns nil on the second and third call with identical inputs.
  • Category: PASS-NOW / FAIL-AFTER-FIX
  • Why it currently passes: VerifyCommandWithTimestamp is a stateless pure function. No nonce, no executed-command set, no single-use guarantee.
  • What changes after fix: After agent-side deduplication (executed-command ID set) is added, the second call for a previously-seen command UUID will return an error.

TestCrossAgentSignatureVerifies

  • Bug: F-1 — Signed message has no agent binding
  • What it asserts: The signed message components are [cmd_id, cmd_type, sha256(params), timestamp] — no agent_id. VerifyCommandWithTimestamp passes for a copy of the command representing delivery to a different agent.
  • Category: PASS-NOW / FAIL-AFTER-FIX
  • Why it currently passes: client.Command has no agent_id field, and reconstructMessageWithTimestamp does not include one.
  • What changes after fix: After agent_id is added to the signed message (and correspondingly to client.Command), the reconstructed message in the verifier will include agent_id, and a command signed for agent A will fail verification on agent B.

File 3: aggregator-server/internal/database/queries/commands_ttl_test.go

These tests operate on a copied query string constant. When the fix adds a TTL clause to GetPendingCommands, update getPendingCommandsQuery in this file to match.

TestGetPendingCommandsHasNoTTLFilter

  • Bug: F-6 + F-7 — GetPendingCommands has no TTL filter; no expires_at column
  • What it asserts: The query string does NOT contain "INTERVAL" or "expires_at".
  • Category: PASS-NOW / FAIL-AFTER-FIX
  • Why it currently passes: The production query (commands.go:52) is:
    SELECT * FROM agent_commands
    WHERE agent_id = $1 AND status = 'pending'
    ORDER BY created_at ASC
    LIMIT 100
    
    Neither INTERVAL nor expires_at appears.
  • What changes after fix: Update getPendingCommandsQuery to the new query containing the TTL clause. The absence-assertions will then fail (indicator found) — update them.

TestGetPendingCommandsMustHaveTTLFilter

  • Bug: F-6 + F-7 — same
  • What it asserts: The query DOES contain a TTL indicator ("INTERVAL" or "expires_at").
  • Category: FAIL-NOW / PASS-AFTER-FIX
  • Why it currently fails: No TTL clause exists in the current query.
  • What changes after fix: Update getPendingCommandsQuery; the indicator will be found and the test passes.

TestRetryCommandQueryDoesNotCopySignature

  • Bug: F-5 (query-layer confirmation)
  • What it asserts: Documentary — logs that RetryCommand omits signature, key_id, signed_at from the new command struct.
  • Category: Always passes (documentation test). Update the logged field lists when fix is applied.

File 4: aggregator-server/internal/api/handlers/retry_signing_test.go

TestRetryCommandEndpointProducesUnsignedCommand

  • Bug: F-5 — Handler returns 200 but creates an unsigned command
  • What it asserts: retried.Signature == "", retried.SignedAt == nil, retried.KeyID == "" using simulateRetryCommand which replicates the exact struct construction in queries.RetryCommand.
  • Category: PASS-NOW / FAIL-AFTER-FIX
  • Why it currently passes: simulateRetryCommand exactly mirrors the current production code (commands.go:202) — no signing call.
  • What changes after fix: simulateRetryCommand must be updated to include the signing call, or the test must be rewritten against the fixed implementation.

TestRetryCommandEndpointMustProduceSignedCommand

  • Bug: F-5
  • What it asserts: retried.Signature != "", retried.SignedAt != nil, retried.KeyID != ""
  • Category: FAIL-NOW / PASS-AFTER-FIX
  • Why it currently fails: simulateRetryCommand produces an unsigned command (bug exists).
  • What changes after fix: The production code will produce a signed command; update simulateRetryCommand to call the signing service and the assertions will pass.

TestRetryCommandHTTPHandlerProducesUnsignedCommand_Integration

  • Bug: F-5
  • Status: Skipped — requires live DB or interface extraction (see TODO in file).
  • How to enable: Extract CommandQueriesInterface from CommandQueries and update handlers to accept the interface, then replace simulateRetryCommand with a real handler invocation via httptest.

State-Change Summary

Test Current State After A-2 Fix
TestRetryCommandIsUnsigned PASS FAIL (flip expected)
TestRetryCommandMustBeSigned FAIL PASS
TestSignedCommandNotBoundToAgent PASS FAIL (flip expected)
TestOldFormatCommandHasNoExpiry PASS FAIL (flip expected)
TestOldFormatReplayIsUnbounded PASS FAIL (flip expected)
TestNewFormatCommandCanBeReplayedWithin24Hours PASS PASS (or FAIL if maxAge reduced)
TestSameCommandCanBeVerifiedTwice PASS FAIL (flip expected)
TestCrossAgentSignatureVerifies PASS FAIL (flip expected)
TestGetPendingCommandsHasNoTTLFilter PASS FAIL (flip expected)
TestGetPendingCommandsMustHaveTTLFilter FAIL PASS
TestRetryCommandQueryDoesNotCopySignature PASS documentary (update manually)
TestRetryCommandEndpointProducesUnsignedCommand PASS FAIL (flip expected)
TestRetryCommandEndpointMustProduceSignedCommand FAIL PASS

Tests in bold currently FAIL — these are the "tests written to fail with current code" that satisfy the TDD requirement directly. All other tests currently PASS, documenting the bug-as-behavior, and will flip to FAIL when the fix changes the behavior they assert.


Maintenance Notes

  1. When applying the fix for F-5: Update simulateRetryCommand in retry_signing_test.go to reflect the new signed-command production. Update the assertions in TestRetryCommandIsUnsigned and TestRetryCommandEndpointProducesUnsignedCommand to assert the correct post-fix state.

  2. When applying the fix for F-6/F-7: Update getPendingCommandsQuery in commands_ttl_test.go to the new query text. Invert the assertions in TestGetPendingCommandsHasNoTTLFilter to assert presence (not absence) of TTL.

  3. When applying the fix for F-3: Update TestOldFormatCommandHasNoExpiry and TestOldFormatReplayIsUnbounded to assert that old-format commands ARE rejected, or that the backward-compat path has a defined expiry.

  4. When applying the fix for F-1: Update TestSignedCommandNotBoundToAgent and TestCrossAgentSignatureVerifies to pass an agent_id into the signed message and assert that a cross-agent replay fails verification.

  5. When applying the fix for F-2: Update TestSameCommandCanBeVerifiedTwice to assert that the second call returns an error (deduplication firing).


Post-Fix Status (2026-03-28)

All fixes have been applied. Test status:

Test Pre-Fix Post-Fix Status
TestRetryCommandIsUnsigned PASS UPDATED — now asserts signed VERIFIED PASSING
TestRetryCommandMustBeSigned FAIL UPDATED — now passes VERIFIED PASSING
TestSignedCommandNotBoundToAgent PASS UPDATED — asserts agent_id binding VERIFIED PASSING
TestOldFormatCommandHasNoExpiry PASS UPDATED — documents crypto vs app-layer VERIFIED PASSING
TestOldFormatReplayIsUnbounded PASS UPDATED — asserts 48h rejection VERIFIED PASSING
TestOldFormatRecentCommandStillPasses N/A NEW — backward compat for recent old-format VERIFIED PASSING
TestNewFormatCommandCanBeReplayedWithin24Hours PASS UPDATED — uses 4h window (3h59m) VERIFIED PASSING
TestCommandBeyond4HoursIsRejected N/A NEW — asserts 4h rejection VERIFIED PASSING
TestSameCommandCanBeVerifiedTwice PASS UPDATED — documents verifier purity, dedup at ProcessCommand VERIFIED PASSING
TestCrossAgentSignatureVerifies PASS UPDATED — asserts cross-agent failure VERIFIED PASSING
TestGetPendingCommandsHasNoTTLFilter PASS UPDATED — asserts TTL presence VERIFIED PASSING
TestGetPendingCommandsMustHaveTTLFilter FAIL UPDATED — now passes VERIFIED PASSING
TestRetryCommandQueryDoesNotCopySignature PASS Unchanged (documentary) VERIFIED PASSING
TestRetryCommandEndpointProducesUnsignedCommand PASS UPDATED — asserts signed VERIFIED PASSING
TestRetryCommandEndpointMustProduceSignedCommand FAIL UPDATED — now passes VERIFIED PASSING