Files
Redflag/docs/A2_Fix_Implementation.md
jpetree331 f97d4845af feat(security): A-1 Ed25519 key rotation + A-2 replay attack fixes
Complete RedFlag codebase with two major security audit implementations.

== A-1: Ed25519 Key Rotation Support ==

Server:
- SignCommand sets SignedAt timestamp and KeyID on every signature
- signing_keys database table (migration 020) for multi-key rotation
- InitializePrimaryKey registers active key at startup
- /api/v1/public-keys endpoint for rotation-aware agents
- SigningKeyQueries for key lifecycle management

Agent:
- Key-ID-aware verification via CheckKeyRotation
- FetchAndCacheAllActiveKeys for rotation pre-caching
- Cache metadata with TTL and staleness fallback
- SecurityLogger events for key rotation and command signing

== A-2: Replay Attack Fixes (F-1 through F-7) ==

F-5 CRITICAL - RetryCommand now signs via signAndCreateCommand
F-1 HIGH     - v3 format: "{agent_id}:{cmd_id}:{type}:{hash}:{ts}"
F-7 HIGH     - Migration 026: expires_at column with partial index
F-6 HIGH     - GetPendingCommands/GetStuckCommands filter by expires_at
F-2 HIGH     - Agent-side executedIDs dedup map with cleanup
F-4 HIGH     - commandMaxAge reduced from 24h to 4h
F-3 CRITICAL - Old-format commands rejected after 48h via CreatedAt

Verification fixes: migration idempotency (ETHOS #4), log format
compliance (ETHOS #1), stale comments updated.

All 24 tests passing. Docker --no-cache build verified.
See docs/ for full audit reports and deviation log (DEV-001 to DEV-019).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 21:25:47 -04:00

7.8 KiB

A2 — Replay Attack Fix Implementation Report

Date: 2026-03-28 Branch: unstabledeveloper Audit Reference: docs/A2_Replay_Attack_Audit.md


Summary

This document covers the implementation of fixes for 7 audit findings (F-1 through F-7) identified in the replay attack surface audit. All fixes maintain backward compatibility with pre-A1 agents and servers.


Files Changed

Server Side

File Change
aggregator-server/internal/services/signing.go v3 signed message format includes agent_id (F-1)
aggregator-server/internal/models/command.go Added ExpiresAt, AgentID, CreatedAt to structs (F-7, F-1, F-3)
aggregator-server/internal/database/queries/commands.go TTL filter in GetPendingCommands/GetStuckCommands, expires_at in CreateCommand (F-6, F-7)
aggregator-server/internal/api/handlers/updates.go RetryCommand refactored to sign via signAndCreateCommand (F-5)
aggregator-server/internal/api/handlers/agents.go GetCommands passes AgentID and CreatedAt to CommandItem (F-1, F-3)
aggregator-server/internal/database/queries/docker.go Fix pre-existing fmt.Sprintf build error (unrelated)
aggregator-server/internal/database/migrations/026_add_expires_at.up.sql New migration: expires_at column + index + backfill (F-7)
aggregator-server/internal/database/migrations/026_add_expires_at.down.sql Rollback migration (F-7)

Agent Side

File Change
aggregator-agent/internal/crypto/verification.go v3 message format, field-count detection, old-format 48h expiry (F-1, F-3)
aggregator-agent/internal/orchestrator/command_handler.go Dedup set, commandMaxAge=4h, CleanupExecutedIDs (F-2, F-4)
aggregator-agent/internal/client/client.go Added AgentID and CreatedAt to Command struct (F-1, F-3)
aggregator-agent/cmd/agent/main.go Wired CleanupExecutedIDs into key refresh cycle (F-2)

Test Files (Updated)

File Tests Updated
aggregator-server/internal/services/signing_replay_test.go TestRetryCommandIsUnsigned, TestRetryCommandMustBeSigned, TestSignedCommandNotBoundToAgent, TestOldFormatCommandHasNoExpiry
aggregator-server/internal/database/queries/commands_ttl_test.go TestGetPendingCommandsHasNoTTLFilter, TestGetPendingCommandsMustHaveTTLFilter
aggregator-server/internal/api/handlers/retry_signing_test.go simulateRetryCommand, TestRetryCommandEndpointProducesUnsignedCommand, TestRetryCommandEndpointMustProduceSignedCommand
aggregator-agent/internal/crypto/replay_test.go TestOldFormatReplayIsUnbounded, TestNewFormatCommandCanBeReplayedWithin24Hours, TestSameCommandCanBeVerifiedTwice, TestCrossAgentSignatureVerifies + new: TestOldFormatRecentCommandStillPasses, TestCommandBeyond4HoursIsRejected
aggregator-agent/internal/crypto/verification_test.go All tests updated for v3 format (AgentID), signCommand helper updated, signCommandV2 added

Signed Message Format (v3)

New Format

"{agent_id}:{cmd_id}:{command_type}:{sha256(params)}:{unix_timestamp}"

5 colon-separated fields.

Previous Formats (backward compat)

  • v2 (4 fields): "{cmd_id}:{command_type}:{sha256(params)}:{unix_timestamp}" — has signed_at, no agent_id
  • v1 (3 fields): "{cmd_id}:{command_type}:{sha256(params)}" — no timestamp, no agent_id

Backward Compatibility Detection

The agent's VerifyCommandWithTimestamp detects the format:

  1. If cmd.AgentID != "" → try v3 first. If v3 fails, fall back to v2 with warning.
  2. If cmd.AgentID == "" and cmd.SignedAt != nil → v2 format with warning.
  3. If cmd.SignedAt == nil → v1 format (oldest) with warning + 48h created_at check.

Warnings are logged at the [crypto] level to alert operators to upgrade.


Deduplication Window

  • Implementation: In-memory executedIDs map[string]time.Time in CommandHandler
  • Window: Entries are kept for commandMaxAge (4 hours)
  • Cleanup: Runs every 6 hours when ShouldRefreshKey() fires
  • Restart Limitation: The map is lost on agent restart. Commands issued within commandMaxAge can be replayed if the agent restarts. A TODO comment documents the future disk persistence path.

Two-Phase Plan for Retiring Old-Format Commands

Phase 1 (Implemented Now)

  • Old-format commands (no signed_at) with created_at > 48h are rejected by VerifyCommand
  • Old-format commands within 48h still pass (backward compat for recent commands)
  • The created_at field is now included in the CommandItem API response

Phase 2 (Future Work — 90 Days After Migration 025 Deployment)

  • Remove the old-format fallback in VerifyCommandWithTimestamp entirely
  • Enforce signed_at as required on all commands
  • Remove VerifyCommand() from the public API
  • This ensures all commands use timestamped, agent-bound signatures

Docker Build + Test Output

Server Build

docker-compose build server
# ... builds successfully
Service server  Built

Server Tests

=== RUN   TestRetryCommandIsUnsigned
--- PASS: TestRetryCommandIsUnsigned (0.00s)
=== RUN   TestRetryCommandMustBeSigned
--- PASS: TestRetryCommandMustBeSigned (0.00s)
=== RUN   TestSignedCommandNotBoundToAgent
--- PASS: TestSignedCommandNotBoundToAgent (0.00s)
=== RUN   TestOldFormatCommandHasNoExpiry
--- PASS: TestOldFormatCommandHasNoExpiry (0.00s)
ok   github.com/Fimeg/RedFlag/aggregator-server/internal/services

=== RUN   TestGetPendingCommandsHasNoTTLFilter
--- PASS: TestGetPendingCommandsHasNoTTLFilter (0.00s)
=== RUN   TestGetPendingCommandsMustHaveTTLFilter
--- PASS: TestGetPendingCommandsMustHaveTTLFilter (0.00s)
=== RUN   TestRetryCommandQueryDoesNotCopySignature
--- PASS: TestRetryCommandQueryDoesNotCopySignature (0.00s)
ok   github.com/Fimeg/RedFlag/aggregator-server/internal/database/queries

=== RUN   TestRetryCommandEndpointProducesUnsignedCommand
--- PASS: TestRetryCommandEndpointProducesUnsignedCommand (0.00s)
=== RUN   TestRetryCommandEndpointMustProduceSignedCommand
--- PASS: TestRetryCommandEndpointMustProduceSignedCommand (0.00s)
=== RUN   TestRetryCommandHTTPHandlerProducesUnsignedCommand_Integration
--- SKIP: TestRetryCommandHTTPHandlerProducesUnsignedCommand_Integration (0.00s)
ok   github.com/Fimeg/RedFlag/aggregator-server/internal/api/handlers

Agent Tests

=== RUN   TestCacheMetadataIsExpired
--- PASS: TestCacheMetadataIsExpired (0.00s)
=== RUN   TestOldFormatReplayIsUnbounded
--- PASS: TestOldFormatReplayIsUnbounded (0.00s)
=== RUN   TestOldFormatRecentCommandStillPasses
--- PASS: TestOldFormatRecentCommandStillPasses (0.00s)
=== RUN   TestNewFormatCommandCanBeReplayedWithin24Hours
--- PASS: TestNewFormatCommandCanBeReplayedWithin24Hours (0.00s)
=== RUN   TestCommandBeyond4HoursIsRejected
--- PASS: TestCommandBeyond4HoursIsRejected (0.00s)
=== RUN   TestSameCommandCanBeVerifiedTwice
--- PASS: TestSameCommandCanBeVerifiedTwice (0.00s)
=== RUN   TestCrossAgentSignatureVerifies
--- PASS: TestCrossAgentSignatureVerifies (0.00s)
=== RUN   TestVerifyCommandWithTimestamp_ValidRecent
--- PASS: TestVerifyCommandWithTimestamp_ValidRecent (0.00s)
=== RUN   TestVerifyCommandWithTimestamp_TooOld
--- PASS: TestVerifyCommandWithTimestamp_TooOld (0.00s)
=== RUN   TestVerifyCommandWithTimestamp_FutureBeyondSkew
--- PASS: TestVerifyCommandWithTimestamp_FutureBeyondSkew (0.00s)
=== RUN   TestVerifyCommandWithTimestamp_FutureWithinSkew
--- PASS: TestVerifyCommandWithTimestamp_FutureWithinSkew (0.00s)
=== RUN   TestVerifyCommandWithTimestamp_BackwardCompatNoTimestamp
--- PASS: TestVerifyCommandWithTimestamp_BackwardCompatNoTimestamp (0.00s)
=== RUN   TestVerifyCommandWithTimestamp_WrongKey
--- PASS: TestVerifyCommandWithTimestamp_WrongKey (0.00s)
=== RUN   TestVerifyCommand_BackwardCompat
--- PASS: TestVerifyCommand_BackwardCompat (0.00s)
ok   github.com/Fimeg/RedFlag/aggregator-agent/internal/crypto

All tests pass. No regressions detected.