feat(security): A-1 Ed25519 key rotation + A-2 replay attack fixes
Complete RedFlag codebase with two major security audit implementations.
== A-1: Ed25519 Key Rotation Support ==
Server:
- SignCommand sets SignedAt timestamp and KeyID on every signature
- signing_keys database table (migration 020) for multi-key rotation
- InitializePrimaryKey registers active key at startup
- /api/v1/public-keys endpoint for rotation-aware agents
- SigningKeyQueries for key lifecycle management
Agent:
- Key-ID-aware verification via CheckKeyRotation
- FetchAndCacheAllActiveKeys for rotation pre-caching
- Cache metadata with TTL and staleness fallback
- SecurityLogger events for key rotation and command signing
== A-2: Replay Attack Fixes (F-1 through F-7) ==
F-5 CRITICAL - RetryCommand now signs via signAndCreateCommand
F-1 HIGH - v3 format: "{agent_id}:{cmd_id}:{type}:{hash}:{ts}"
F-7 HIGH - Migration 026: expires_at column with partial index
F-6 HIGH - GetPendingCommands/GetStuckCommands filter by expires_at
F-2 HIGH - Agent-side executedIDs dedup map with cleanup
F-4 HIGH - commandMaxAge reduced from 24h to 4h
F-3 CRITICAL - Old-format commands rejected after 48h via CreatedAt
Verification fixes: migration idempotency (ETHOS #4), log format
compliance (ETHOS #1), stale comments updated.
All 24 tests passing. Docker --no-cache build verified.
See docs/ for full audit reports and deviation log (DEV-001 to DEV-019).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
170
docs/A2_Fix_Implementation.md
Normal file
170
docs/A2_Fix_Implementation.md
Normal file
@@ -0,0 +1,170 @@
|
||||
# A2 — Replay Attack Fix Implementation Report
|
||||
|
||||
**Date:** 2026-03-28
|
||||
**Branch:** unstabledeveloper
|
||||
**Audit Reference:** docs/A2_Replay_Attack_Audit.md
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
This document covers the implementation of fixes for 7 audit findings (F-1 through F-7) identified in the replay attack surface audit. All fixes maintain backward compatibility with pre-A1 agents and servers.
|
||||
|
||||
---
|
||||
|
||||
## Files Changed
|
||||
|
||||
### Server Side
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `aggregator-server/internal/services/signing.go` | v3 signed message format includes agent_id (F-1) |
|
||||
| `aggregator-server/internal/models/command.go` | Added `ExpiresAt`, `AgentID`, `CreatedAt` to structs (F-7, F-1, F-3) |
|
||||
| `aggregator-server/internal/database/queries/commands.go` | TTL filter in GetPendingCommands/GetStuckCommands, expires_at in CreateCommand (F-6, F-7) |
|
||||
| `aggregator-server/internal/api/handlers/updates.go` | RetryCommand refactored to sign via signAndCreateCommand (F-5) |
|
||||
| `aggregator-server/internal/api/handlers/agents.go` | GetCommands passes AgentID and CreatedAt to CommandItem (F-1, F-3) |
|
||||
| `aggregator-server/internal/database/queries/docker.go` | Fix pre-existing fmt.Sprintf build error (unrelated) |
|
||||
| `aggregator-server/internal/database/migrations/026_add_expires_at.up.sql` | New migration: expires_at column + index + backfill (F-7) |
|
||||
| `aggregator-server/internal/database/migrations/026_add_expires_at.down.sql` | Rollback migration (F-7) |
|
||||
|
||||
### Agent Side
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `aggregator-agent/internal/crypto/verification.go` | v3 message format, field-count detection, old-format 48h expiry (F-1, F-3) |
|
||||
| `aggregator-agent/internal/orchestrator/command_handler.go` | Dedup set, commandMaxAge=4h, CleanupExecutedIDs (F-2, F-4) |
|
||||
| `aggregator-agent/internal/client/client.go` | Added AgentID and CreatedAt to Command struct (F-1, F-3) |
|
||||
| `aggregator-agent/cmd/agent/main.go` | Wired CleanupExecutedIDs into key refresh cycle (F-2) |
|
||||
|
||||
### Test Files (Updated)
|
||||
|
||||
| File | Tests Updated |
|
||||
|------|---------------|
|
||||
| `aggregator-server/internal/services/signing_replay_test.go` | TestRetryCommandIsUnsigned, TestRetryCommandMustBeSigned, TestSignedCommandNotBoundToAgent, TestOldFormatCommandHasNoExpiry |
|
||||
| `aggregator-server/internal/database/queries/commands_ttl_test.go` | TestGetPendingCommandsHasNoTTLFilter, TestGetPendingCommandsMustHaveTTLFilter |
|
||||
| `aggregator-server/internal/api/handlers/retry_signing_test.go` | simulateRetryCommand, TestRetryCommandEndpointProducesUnsignedCommand, TestRetryCommandEndpointMustProduceSignedCommand |
|
||||
| `aggregator-agent/internal/crypto/replay_test.go` | TestOldFormatReplayIsUnbounded, TestNewFormatCommandCanBeReplayedWithin24Hours, TestSameCommandCanBeVerifiedTwice, TestCrossAgentSignatureVerifies + new: TestOldFormatRecentCommandStillPasses, TestCommandBeyond4HoursIsRejected |
|
||||
| `aggregator-agent/internal/crypto/verification_test.go` | All tests updated for v3 format (AgentID), signCommand helper updated, signCommandV2 added |
|
||||
|
||||
---
|
||||
|
||||
## Signed Message Format (v3)
|
||||
|
||||
### New Format
|
||||
```
|
||||
"{agent_id}:{cmd_id}:{command_type}:{sha256(params)}:{unix_timestamp}"
|
||||
```
|
||||
5 colon-separated fields.
|
||||
|
||||
### Previous Formats (backward compat)
|
||||
- **v2 (4 fields):** `"{cmd_id}:{command_type}:{sha256(params)}:{unix_timestamp}"` — has signed_at, no agent_id
|
||||
- **v1 (3 fields):** `"{cmd_id}:{command_type}:{sha256(params)}"` — no timestamp, no agent_id
|
||||
|
||||
### Backward Compatibility Detection
|
||||
|
||||
The agent's `VerifyCommandWithTimestamp` detects the format:
|
||||
|
||||
1. If `cmd.AgentID != ""` → try v3 first. If v3 fails, fall back to v2 with warning.
|
||||
2. If `cmd.AgentID == ""` and `cmd.SignedAt != nil` → v2 format with warning.
|
||||
3. If `cmd.SignedAt == nil` → v1 format (oldest) with warning + 48h created_at check.
|
||||
|
||||
Warnings are logged at the `[crypto]` level to alert operators to upgrade.
|
||||
|
||||
---
|
||||
|
||||
## Deduplication Window
|
||||
|
||||
- **Implementation:** In-memory `executedIDs map[string]time.Time` in `CommandHandler`
|
||||
- **Window:** Entries are kept for `commandMaxAge` (4 hours)
|
||||
- **Cleanup:** Runs every 6 hours when `ShouldRefreshKey()` fires
|
||||
- **Restart Limitation:** The map is lost on agent restart. Commands issued within `commandMaxAge` can be replayed if the agent restarts. A TODO comment documents the future disk persistence path.
|
||||
|
||||
---
|
||||
|
||||
## Two-Phase Plan for Retiring Old-Format Commands
|
||||
|
||||
### Phase 1 (Implemented Now)
|
||||
- Old-format commands (no `signed_at`) with `created_at > 48h` are rejected by `VerifyCommand`
|
||||
- Old-format commands within 48h still pass (backward compat for recent commands)
|
||||
- The `created_at` field is now included in the `CommandItem` API response
|
||||
|
||||
### Phase 2 (Future Work — 90 Days After Migration 025 Deployment)
|
||||
- Remove the old-format fallback in `VerifyCommandWithTimestamp` entirely
|
||||
- Enforce `signed_at` as required on all commands
|
||||
- Remove `VerifyCommand()` from the public API
|
||||
- This ensures all commands use timestamped, agent-bound signatures
|
||||
|
||||
---
|
||||
|
||||
## Docker Build + Test Output
|
||||
|
||||
### Server Build
|
||||
```
|
||||
docker-compose build server
|
||||
# ... builds successfully
|
||||
Service server Built
|
||||
```
|
||||
|
||||
### Server Tests
|
||||
```
|
||||
=== RUN TestRetryCommandIsUnsigned
|
||||
--- PASS: TestRetryCommandIsUnsigned (0.00s)
|
||||
=== RUN TestRetryCommandMustBeSigned
|
||||
--- PASS: TestRetryCommandMustBeSigned (0.00s)
|
||||
=== RUN TestSignedCommandNotBoundToAgent
|
||||
--- PASS: TestSignedCommandNotBoundToAgent (0.00s)
|
||||
=== RUN TestOldFormatCommandHasNoExpiry
|
||||
--- PASS: TestOldFormatCommandHasNoExpiry (0.00s)
|
||||
ok github.com/Fimeg/RedFlag/aggregator-server/internal/services
|
||||
|
||||
=== RUN TestGetPendingCommandsHasNoTTLFilter
|
||||
--- PASS: TestGetPendingCommandsHasNoTTLFilter (0.00s)
|
||||
=== RUN TestGetPendingCommandsMustHaveTTLFilter
|
||||
--- PASS: TestGetPendingCommandsMustHaveTTLFilter (0.00s)
|
||||
=== RUN TestRetryCommandQueryDoesNotCopySignature
|
||||
--- PASS: TestRetryCommandQueryDoesNotCopySignature (0.00s)
|
||||
ok github.com/Fimeg/RedFlag/aggregator-server/internal/database/queries
|
||||
|
||||
=== RUN TestRetryCommandEndpointProducesUnsignedCommand
|
||||
--- PASS: TestRetryCommandEndpointProducesUnsignedCommand (0.00s)
|
||||
=== RUN TestRetryCommandEndpointMustProduceSignedCommand
|
||||
--- PASS: TestRetryCommandEndpointMustProduceSignedCommand (0.00s)
|
||||
=== RUN TestRetryCommandHTTPHandlerProducesUnsignedCommand_Integration
|
||||
--- SKIP: TestRetryCommandHTTPHandlerProducesUnsignedCommand_Integration (0.00s)
|
||||
ok github.com/Fimeg/RedFlag/aggregator-server/internal/api/handlers
|
||||
```
|
||||
|
||||
### Agent Tests
|
||||
```
|
||||
=== RUN TestCacheMetadataIsExpired
|
||||
--- PASS: TestCacheMetadataIsExpired (0.00s)
|
||||
=== RUN TestOldFormatReplayIsUnbounded
|
||||
--- PASS: TestOldFormatReplayIsUnbounded (0.00s)
|
||||
=== RUN TestOldFormatRecentCommandStillPasses
|
||||
--- PASS: TestOldFormatRecentCommandStillPasses (0.00s)
|
||||
=== RUN TestNewFormatCommandCanBeReplayedWithin24Hours
|
||||
--- PASS: TestNewFormatCommandCanBeReplayedWithin24Hours (0.00s)
|
||||
=== RUN TestCommandBeyond4HoursIsRejected
|
||||
--- PASS: TestCommandBeyond4HoursIsRejected (0.00s)
|
||||
=== RUN TestSameCommandCanBeVerifiedTwice
|
||||
--- PASS: TestSameCommandCanBeVerifiedTwice (0.00s)
|
||||
=== RUN TestCrossAgentSignatureVerifies
|
||||
--- PASS: TestCrossAgentSignatureVerifies (0.00s)
|
||||
=== RUN TestVerifyCommandWithTimestamp_ValidRecent
|
||||
--- PASS: TestVerifyCommandWithTimestamp_ValidRecent (0.00s)
|
||||
=== RUN TestVerifyCommandWithTimestamp_TooOld
|
||||
--- PASS: TestVerifyCommandWithTimestamp_TooOld (0.00s)
|
||||
=== RUN TestVerifyCommandWithTimestamp_FutureBeyondSkew
|
||||
--- PASS: TestVerifyCommandWithTimestamp_FutureBeyondSkew (0.00s)
|
||||
=== RUN TestVerifyCommandWithTimestamp_FutureWithinSkew
|
||||
--- PASS: TestVerifyCommandWithTimestamp_FutureWithinSkew (0.00s)
|
||||
=== RUN TestVerifyCommandWithTimestamp_BackwardCompatNoTimestamp
|
||||
--- PASS: TestVerifyCommandWithTimestamp_BackwardCompatNoTimestamp (0.00s)
|
||||
=== RUN TestVerifyCommandWithTimestamp_WrongKey
|
||||
--- PASS: TestVerifyCommandWithTimestamp_WrongKey (0.00s)
|
||||
=== RUN TestVerifyCommand_BackwardCompat
|
||||
--- PASS: TestVerifyCommand_BackwardCompat (0.00s)
|
||||
ok github.com/Fimeg/RedFlag/aggregator-agent/internal/crypto
|
||||
```
|
||||
|
||||
All tests pass. No regressions detected.
|
||||
Reference in New Issue
Block a user