Complete RedFlag codebase with two major security audit implementations.
== A-1: Ed25519 Key Rotation Support ==
Server:
- SignCommand sets SignedAt timestamp and KeyID on every signature
- signing_keys database table (migration 020) for multi-key rotation
- InitializePrimaryKey registers active key at startup
- /api/v1/public-keys endpoint for rotation-aware agents
- SigningKeyQueries for key lifecycle management
Agent:
- Key-ID-aware verification via CheckKeyRotation
- FetchAndCacheAllActiveKeys for rotation pre-caching
- Cache metadata with TTL and staleness fallback
- SecurityLogger events for key rotation and command signing
== A-2: Replay Attack Fixes (F-1 through F-7) ==
F-5 CRITICAL - RetryCommand now signs via signAndCreateCommand
F-1 HIGH - v3 format: "{agent_id}:{cmd_id}:{type}:{hash}:{ts}"
F-7 HIGH - Migration 026: expires_at column with partial index
F-6 HIGH - GetPendingCommands/GetStuckCommands filter by expires_at
F-2 HIGH - Agent-side executedIDs dedup map with cleanup
F-4 HIGH - commandMaxAge reduced from 24h to 4h
F-3 CRITICAL - Old-format commands rejected after 48h via CreatedAt
Verification fixes: migration idempotency (ETHOS #4), log format
compliance (ETHOS #1), stale comments updated.
All 24 tests passing. Docker --no-cache build verified.
See docs/ for full audit reports and deviation log (DEV-001 to DEV-019).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7.8 KiB
7.8 KiB
A2 — Replay Attack Fix Implementation Report
Date: 2026-03-28 Branch: unstabledeveloper Audit Reference: docs/A2_Replay_Attack_Audit.md
Summary
This document covers the implementation of fixes for 7 audit findings (F-1 through F-7) identified in the replay attack surface audit. All fixes maintain backward compatibility with pre-A1 agents and servers.
Files Changed
Server Side
| File | Change |
|---|---|
aggregator-server/internal/services/signing.go |
v3 signed message format includes agent_id (F-1) |
aggregator-server/internal/models/command.go |
Added ExpiresAt, AgentID, CreatedAt to structs (F-7, F-1, F-3) |
aggregator-server/internal/database/queries/commands.go |
TTL filter in GetPendingCommands/GetStuckCommands, expires_at in CreateCommand (F-6, F-7) |
aggregator-server/internal/api/handlers/updates.go |
RetryCommand refactored to sign via signAndCreateCommand (F-5) |
aggregator-server/internal/api/handlers/agents.go |
GetCommands passes AgentID and CreatedAt to CommandItem (F-1, F-3) |
aggregator-server/internal/database/queries/docker.go |
Fix pre-existing fmt.Sprintf build error (unrelated) |
aggregator-server/internal/database/migrations/026_add_expires_at.up.sql |
New migration: expires_at column + index + backfill (F-7) |
aggregator-server/internal/database/migrations/026_add_expires_at.down.sql |
Rollback migration (F-7) |
Agent Side
| File | Change |
|---|---|
aggregator-agent/internal/crypto/verification.go |
v3 message format, field-count detection, old-format 48h expiry (F-1, F-3) |
aggregator-agent/internal/orchestrator/command_handler.go |
Dedup set, commandMaxAge=4h, CleanupExecutedIDs (F-2, F-4) |
aggregator-agent/internal/client/client.go |
Added AgentID and CreatedAt to Command struct (F-1, F-3) |
aggregator-agent/cmd/agent/main.go |
Wired CleanupExecutedIDs into key refresh cycle (F-2) |
Test Files (Updated)
| File | Tests Updated |
|---|---|
aggregator-server/internal/services/signing_replay_test.go |
TestRetryCommandIsUnsigned, TestRetryCommandMustBeSigned, TestSignedCommandNotBoundToAgent, TestOldFormatCommandHasNoExpiry |
aggregator-server/internal/database/queries/commands_ttl_test.go |
TestGetPendingCommandsHasNoTTLFilter, TestGetPendingCommandsMustHaveTTLFilter |
aggregator-server/internal/api/handlers/retry_signing_test.go |
simulateRetryCommand, TestRetryCommandEndpointProducesUnsignedCommand, TestRetryCommandEndpointMustProduceSignedCommand |
aggregator-agent/internal/crypto/replay_test.go |
TestOldFormatReplayIsUnbounded, TestNewFormatCommandCanBeReplayedWithin24Hours, TestSameCommandCanBeVerifiedTwice, TestCrossAgentSignatureVerifies + new: TestOldFormatRecentCommandStillPasses, TestCommandBeyond4HoursIsRejected |
aggregator-agent/internal/crypto/verification_test.go |
All tests updated for v3 format (AgentID), signCommand helper updated, signCommandV2 added |
Signed Message Format (v3)
New Format
"{agent_id}:{cmd_id}:{command_type}:{sha256(params)}:{unix_timestamp}"
5 colon-separated fields.
Previous Formats (backward compat)
- v2 (4 fields):
"{cmd_id}:{command_type}:{sha256(params)}:{unix_timestamp}"— has signed_at, no agent_id - v1 (3 fields):
"{cmd_id}:{command_type}:{sha256(params)}"— no timestamp, no agent_id
Backward Compatibility Detection
The agent's VerifyCommandWithTimestamp detects the format:
- If
cmd.AgentID != ""→ try v3 first. If v3 fails, fall back to v2 with warning. - If
cmd.AgentID == ""andcmd.SignedAt != nil→ v2 format with warning. - If
cmd.SignedAt == nil→ v1 format (oldest) with warning + 48h created_at check.
Warnings are logged at the [crypto] level to alert operators to upgrade.
Deduplication Window
- Implementation: In-memory
executedIDs map[string]time.TimeinCommandHandler - Window: Entries are kept for
commandMaxAge(4 hours) - Cleanup: Runs every 6 hours when
ShouldRefreshKey()fires - Restart Limitation: The map is lost on agent restart. Commands issued within
commandMaxAgecan be replayed if the agent restarts. A TODO comment documents the future disk persistence path.
Two-Phase Plan for Retiring Old-Format Commands
Phase 1 (Implemented Now)
- Old-format commands (no
signed_at) withcreated_at > 48hare rejected byVerifyCommand - Old-format commands within 48h still pass (backward compat for recent commands)
- The
created_atfield is now included in theCommandItemAPI response
Phase 2 (Future Work — 90 Days After Migration 025 Deployment)
- Remove the old-format fallback in
VerifyCommandWithTimestampentirely - Enforce
signed_atas required on all commands - Remove
VerifyCommand()from the public API - This ensures all commands use timestamped, agent-bound signatures
Docker Build + Test Output
Server Build
docker-compose build server
# ... builds successfully
Service server Built
Server Tests
=== RUN TestRetryCommandIsUnsigned
--- PASS: TestRetryCommandIsUnsigned (0.00s)
=== RUN TestRetryCommandMustBeSigned
--- PASS: TestRetryCommandMustBeSigned (0.00s)
=== RUN TestSignedCommandNotBoundToAgent
--- PASS: TestSignedCommandNotBoundToAgent (0.00s)
=== RUN TestOldFormatCommandHasNoExpiry
--- PASS: TestOldFormatCommandHasNoExpiry (0.00s)
ok github.com/Fimeg/RedFlag/aggregator-server/internal/services
=== RUN TestGetPendingCommandsHasNoTTLFilter
--- PASS: TestGetPendingCommandsHasNoTTLFilter (0.00s)
=== RUN TestGetPendingCommandsMustHaveTTLFilter
--- PASS: TestGetPendingCommandsMustHaveTTLFilter (0.00s)
=== RUN TestRetryCommandQueryDoesNotCopySignature
--- PASS: TestRetryCommandQueryDoesNotCopySignature (0.00s)
ok github.com/Fimeg/RedFlag/aggregator-server/internal/database/queries
=== RUN TestRetryCommandEndpointProducesUnsignedCommand
--- PASS: TestRetryCommandEndpointProducesUnsignedCommand (0.00s)
=== RUN TestRetryCommandEndpointMustProduceSignedCommand
--- PASS: TestRetryCommandEndpointMustProduceSignedCommand (0.00s)
=== RUN TestRetryCommandHTTPHandlerProducesUnsignedCommand_Integration
--- SKIP: TestRetryCommandHTTPHandlerProducesUnsignedCommand_Integration (0.00s)
ok github.com/Fimeg/RedFlag/aggregator-server/internal/api/handlers
Agent Tests
=== RUN TestCacheMetadataIsExpired
--- PASS: TestCacheMetadataIsExpired (0.00s)
=== RUN TestOldFormatReplayIsUnbounded
--- PASS: TestOldFormatReplayIsUnbounded (0.00s)
=== RUN TestOldFormatRecentCommandStillPasses
--- PASS: TestOldFormatRecentCommandStillPasses (0.00s)
=== RUN TestNewFormatCommandCanBeReplayedWithin24Hours
--- PASS: TestNewFormatCommandCanBeReplayedWithin24Hours (0.00s)
=== RUN TestCommandBeyond4HoursIsRejected
--- PASS: TestCommandBeyond4HoursIsRejected (0.00s)
=== RUN TestSameCommandCanBeVerifiedTwice
--- PASS: TestSameCommandCanBeVerifiedTwice (0.00s)
=== RUN TestCrossAgentSignatureVerifies
--- PASS: TestCrossAgentSignatureVerifies (0.00s)
=== RUN TestVerifyCommandWithTimestamp_ValidRecent
--- PASS: TestVerifyCommandWithTimestamp_ValidRecent (0.00s)
=== RUN TestVerifyCommandWithTimestamp_TooOld
--- PASS: TestVerifyCommandWithTimestamp_TooOld (0.00s)
=== RUN TestVerifyCommandWithTimestamp_FutureBeyondSkew
--- PASS: TestVerifyCommandWithTimestamp_FutureBeyondSkew (0.00s)
=== RUN TestVerifyCommandWithTimestamp_FutureWithinSkew
--- PASS: TestVerifyCommandWithTimestamp_FutureWithinSkew (0.00s)
=== RUN TestVerifyCommandWithTimestamp_BackwardCompatNoTimestamp
--- PASS: TestVerifyCommandWithTimestamp_BackwardCompatNoTimestamp (0.00s)
=== RUN TestVerifyCommandWithTimestamp_WrongKey
--- PASS: TestVerifyCommandWithTimestamp_WrongKey (0.00s)
=== RUN TestVerifyCommand_BackwardCompat
--- PASS: TestVerifyCommand_BackwardCompat (0.00s)
ok github.com/Fimeg/RedFlag/aggregator-agent/internal/crypto
All tests pass. No regressions detected.