feat(security): A-1 Ed25519 key rotation + A-2 replay attack fixes
Complete RedFlag codebase with two major security audit implementations.
== A-1: Ed25519 Key Rotation Support ==
Server:
- SignCommand sets SignedAt timestamp and KeyID on every signature
- signing_keys database table (migration 020) for multi-key rotation
- InitializePrimaryKey registers active key at startup
- /api/v1/public-keys endpoint for rotation-aware agents
- SigningKeyQueries for key lifecycle management
Agent:
- Key-ID-aware verification via CheckKeyRotation
- FetchAndCacheAllActiveKeys for rotation pre-caching
- Cache metadata with TTL and staleness fallback
- SecurityLogger events for key rotation and command signing
== A-2: Replay Attack Fixes (F-1 through F-7) ==
F-5 CRITICAL - RetryCommand now signs via signAndCreateCommand
F-1 HIGH - v3 format: "{agent_id}:{cmd_id}:{type}:{hash}:{ts}"
F-7 HIGH - Migration 026: expires_at column with partial index
F-6 HIGH - GetPendingCommands/GetStuckCommands filter by expires_at
F-2 HIGH - Agent-side executedIDs dedup map with cleanup
F-4 HIGH - commandMaxAge reduced from 24h to 4h
F-3 CRITICAL - Old-format commands rejected after 48h via CreatedAt
Verification fixes: migration idempotency (ETHOS #4), log format
compliance (ETHOS #1), stale comments updated.
All 24 tests passing. Docker --no-cache build verified.
See docs/ for full audit reports and deviation log (DEV-001 to DEV-019).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
291
docs/A2_PreFix_Tests.md
Normal file
291
docs/A2_PreFix_Tests.md
Normal file
@@ -0,0 +1,291 @@
|
||||
# A-2 Pre-Fix Test Suite
|
||||
**Date**: 2026-03-28
|
||||
**Branch**: unstabledeveloper
|
||||
**Purpose**: Document replay attack bugs BEFORE fixes are applied.
|
||||
|
||||
These tests prove that the bugs exist today and will prove the fixes work
|
||||
when applied. Do NOT modify these tests before the fix is ready — they are
|
||||
the regression baseline.
|
||||
|
||||
---
|
||||
|
||||
## Test Files Created
|
||||
|
||||
| File | Package | Bugs Documented |
|
||||
|------|---------|-----------------|
|
||||
| `aggregator-server/internal/services/signing_replay_test.go` | `services_test` | F-5, F-1, F-3 |
|
||||
| `aggregator-agent/internal/crypto/replay_test.go` | `crypto` | F-3, F-4, F-2, F-1 |
|
||||
| `aggregator-server/internal/database/queries/commands_ttl_test.go` | `queries_test` | F-6, F-7, F-5 |
|
||||
| `aggregator-server/internal/api/handlers/retry_signing_test.go` | `handlers_test` | F-5 |
|
||||
|
||||
---
|
||||
|
||||
## How to Run
|
||||
|
||||
```bash
|
||||
# Server-side tests (all pre-fix tests)
|
||||
cd aggregator-server && go test ./internal/services/... -v -run "TestRetry|TestSigned|TestOld"
|
||||
cd aggregator-server && go test ./internal/database/queries/... -v -run TestGetPending
|
||||
cd aggregator-server && go test ./internal/api/handlers/... -v -run TestRetryCommand
|
||||
|
||||
# Agent-side tests (all pre-fix tests)
|
||||
cd aggregator-agent && go test ./internal/crypto/... -v -run "TestOld|TestNew|TestSame|TestCross"
|
||||
|
||||
# Run everything with verbose output
|
||||
cd aggregator-server && go test ./... -v 2>&1 | grep -E "(PASS|FAIL|BUG|---)"
|
||||
cd aggregator-agent && go test ./... -v 2>&1 | grep -E "(PASS|FAIL|BUG|---)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Test Inventory
|
||||
|
||||
### Behaviour Categories
|
||||
|
||||
**PASS-NOW / FAIL-AFTER-FIX** — Asserts the CURRENT (buggy) behaviour.
|
||||
The test passes because the bug exists. When the fix is applied, the behaviour
|
||||
changes and this test fails — signalling that the test itself needs to be
|
||||
updated to assert the new correct state.
|
||||
|
||||
**FAIL-NOW / PASS-AFTER-FIX** — Asserts the CORRECT post-fix behaviour.
|
||||
The test fails because the bug exists. When the fix is applied, the assertion
|
||||
becomes true and the test passes — proving the fix works.
|
||||
|
||||
---
|
||||
|
||||
### File 1: `aggregator-server/internal/services/signing_replay_test.go`
|
||||
|
||||
#### `TestRetryCommandIsUnsigned`
|
||||
- **Bug**: F-5 — RetryCommand creates unsigned commands
|
||||
- **What it asserts**: `retried.Signature == ""`, `retried.SignedAt == nil`, `retried.KeyID == ""`
|
||||
- **Category**: PASS-NOW / FAIL-AFTER-FIX
|
||||
- **Why it currently passes**: `queries.RetryCommand` (commands.go:189) builds a
|
||||
new `AgentCommand` struct without calling `signAndCreateCommand`. All three
|
||||
signature fields are zero values.
|
||||
- **What changes after fix**: `RetryCommand` will call `signAndCreateCommand`, so
|
||||
`retried.Signature` will be non-empty — the assertions flip to failures.
|
||||
- **Operator impact**: Until fixed, every "Retry" click in the dashboard creates an
|
||||
unsigned command. In strict enforcement mode the agent rejects it silently, logging
|
||||
`"command verification failed: strict enforcement requires signed commands"`. The
|
||||
server returns HTTP 200 so the operator sees no error.
|
||||
|
||||
#### `TestRetryCommandMustBeSigned`
|
||||
- **Bug**: F-5 — RetryCommand creates unsigned commands
|
||||
- **What it asserts**: `retried.Signature != ""`, `retried.SignedAt != nil`, `retried.KeyID != ""`
|
||||
- **Category**: FAIL-NOW / PASS-AFTER-FIX
|
||||
- **Why it currently fails**: The retry command is unsigned (bug F-5 exists).
|
||||
- **What changes after fix**: All three fields will be populated; test passes.
|
||||
|
||||
#### `TestSignedCommandNotBoundToAgent`
|
||||
- **Bug**: F-1 — `agent_id` absent from signed payload
|
||||
- **What it asserts**: `agentA.String()` is NOT in the signed message, and
|
||||
`ed25519.Verify` returns `true` for the command regardless of which agent receives it.
|
||||
- **Category**: PASS-NOW / FAIL-AFTER-FIX
|
||||
- **Why it currently passes**: Signed message is `{id}:{type}:{sha256(params)}:{ts}`.
|
||||
No `agent_id` component. `ed25519.Verify` ignores anything outside the signed message.
|
||||
- **What changes after fix**: When `agent_id` is added to the signed message, the message
|
||||
reconstructed in the test (without `agent_id`) will not match the signature — `ed25519.Verify`
|
||||
returns `false` and the test fails.
|
||||
- **Attack scenario**: An attacker with DB write access can copy a signed command from
|
||||
agent A into agent B's `agent_commands` queue. The signature passes verification on agent B.
|
||||
|
||||
#### `TestOldFormatCommandHasNoExpiry`
|
||||
- **Bug**: F-3 — Old-format commands (no `signed_at`) valid forever
|
||||
- **What it asserts**: `ed25519.Verify` returns `true` for an old-format signature
|
||||
(no timestamp in the message) regardless of when verification occurs.
|
||||
- **Category**: PASS-NOW / FAIL-AFTER-FIX
|
||||
- **Why it currently passes**: `ed25519.Verify` is a pure cryptographic check — it has no
|
||||
time component. The old format `{id}:{type}:{sha256(params)}` contains no timestamp, so
|
||||
there is nothing to expire.
|
||||
- **What changes after fix**: Either `VerifyCommand` is updated to reject old-format
|
||||
commands outright (requiring `signed_at`), or a `created_at` check is added — the test
|
||||
would then need to be updated to expect rejection.
|
||||
|
||||
---
|
||||
|
||||
### File 2: `aggregator-agent/internal/crypto/replay_test.go`
|
||||
|
||||
Uses helpers `generateKeyPair`, `signCommand`, `signCommandOld` from `verification_test.go`
|
||||
(same package).
|
||||
|
||||
#### `TestOldFormatReplayIsUnbounded`
|
||||
- **Bug**: F-3 — `VerifyCommand` has no time check
|
||||
- **What it asserts**: `v.VerifyCommand(cmd, pub)` returns `nil` for a command with
|
||||
`SignedAt == nil` (old format), regardless of age.
|
||||
- **Category**: PASS-NOW / FAIL-AFTER-FIX
|
||||
- **Why it currently passes**: `VerifyCommand` (verification.go:25) performs only an
|
||||
Ed25519 signature check. No `created_at` or `SignedAt` field is examined.
|
||||
- **What changes after fix**: After adding an expiry check, `VerifyCommand` will return
|
||||
an error for old-format commands beyond a defined age, and this test will fail.
|
||||
|
||||
#### `TestNewFormatCommandCanBeReplayedWithin24Hours`
|
||||
- **Bug**: F-4 — 24-hour replay window (large but intentional)
|
||||
- **What it asserts**: `VerifyCommandWithTimestamp` returns `nil` for a command signed
|
||||
23h59m ago (within the 24h `commandMaxAge`).
|
||||
- **Category**: PASS-NOW / WILL-REMAIN-PASSING until `commandMaxAge` is reduced
|
||||
- **Why it currently passes**: By design — the 24h window is intentional to accommodate
|
||||
polling intervals and network delays.
|
||||
- **What changes after fix**: If `commandMaxAge` is reduced (e.g. to 4h per the A-2 audit
|
||||
recommendation), this test will FAIL for commands older than the new limit. Update the
|
||||
`time.Duration` in the test when `commandMaxAge` is changed.
|
||||
|
||||
#### `TestSameCommandCanBeVerifiedTwice`
|
||||
- **Bug**: F-2 — No nonce; same command verifies any number of times
|
||||
- **What it asserts**: `VerifyCommandWithTimestamp` returns `nil` on the second and
|
||||
third call with identical inputs.
|
||||
- **Category**: PASS-NOW / FAIL-AFTER-FIX
|
||||
- **Why it currently passes**: `VerifyCommandWithTimestamp` is a stateless pure function.
|
||||
No nonce, no executed-command set, no single-use guarantee.
|
||||
- **What changes after fix**: After agent-side deduplication (executed-command ID set) is
|
||||
added, the second call for a previously-seen command UUID will return an error.
|
||||
|
||||
#### `TestCrossAgentSignatureVerifies`
|
||||
- **Bug**: F-1 — Signed message has no agent binding
|
||||
- **What it asserts**: The signed message components are `[cmd_id, cmd_type, sha256(params),
|
||||
timestamp]` — no `agent_id`. `VerifyCommandWithTimestamp` passes for a copy of the command
|
||||
representing delivery to a different agent.
|
||||
- **Category**: PASS-NOW / FAIL-AFTER-FIX
|
||||
- **Why it currently passes**: `client.Command` has no `agent_id` field, and
|
||||
`reconstructMessageWithTimestamp` does not include one.
|
||||
- **What changes after fix**: After `agent_id` is added to the signed message (and
|
||||
correspondingly to `client.Command`), the reconstructed message in the verifier will
|
||||
include `agent_id`, and a command signed for agent A will fail verification on agent B.
|
||||
|
||||
---
|
||||
|
||||
### File 3: `aggregator-server/internal/database/queries/commands_ttl_test.go`
|
||||
|
||||
These tests operate on a copied query string constant. When the fix adds a TTL clause to
|
||||
`GetPendingCommands`, update `getPendingCommandsQuery` in this file to match.
|
||||
|
||||
#### `TestGetPendingCommandsHasNoTTLFilter`
|
||||
- **Bug**: F-6 + F-7 — `GetPendingCommands` has no TTL filter; no `expires_at` column
|
||||
- **What it asserts**: The query string does NOT contain `"INTERVAL"` or `"expires_at"`.
|
||||
- **Category**: PASS-NOW / FAIL-AFTER-FIX
|
||||
- **Why it currently passes**: The production query (commands.go:52) is:
|
||||
```sql
|
||||
SELECT * FROM agent_commands
|
||||
WHERE agent_id = $1 AND status = 'pending'
|
||||
ORDER BY created_at ASC
|
||||
LIMIT 100
|
||||
```
|
||||
Neither `INTERVAL` nor `expires_at` appears.
|
||||
- **What changes after fix**: Update `getPendingCommandsQuery` to the new query containing
|
||||
the TTL clause. The absence-assertions will then fail (indicator found) — update them.
|
||||
|
||||
#### `TestGetPendingCommandsMustHaveTTLFilter`
|
||||
- **Bug**: F-6 + F-7 — same
|
||||
- **What it asserts**: The query DOES contain a TTL indicator (`"INTERVAL"` or `"expires_at"`).
|
||||
- **Category**: FAIL-NOW / PASS-AFTER-FIX
|
||||
- **Why it currently fails**: No TTL clause exists in the current query.
|
||||
- **What changes after fix**: Update `getPendingCommandsQuery`; the indicator will be found
|
||||
and the test passes.
|
||||
|
||||
#### `TestRetryCommandQueryDoesNotCopySignature`
|
||||
- **Bug**: F-5 (query-layer confirmation)
|
||||
- **What it asserts**: Documentary — logs that `RetryCommand` omits `signature`, `key_id`,
|
||||
`signed_at` from the new command struct.
|
||||
- **Category**: Always passes (documentation test). Update the logged field lists when fix
|
||||
is applied.
|
||||
|
||||
---
|
||||
|
||||
### File 4: `aggregator-server/internal/api/handlers/retry_signing_test.go`
|
||||
|
||||
#### `TestRetryCommandEndpointProducesUnsignedCommand`
|
||||
- **Bug**: F-5 — Handler returns 200 but creates an unsigned command
|
||||
- **What it asserts**: `retried.Signature == ""`, `retried.SignedAt == nil`, `retried.KeyID == ""`
|
||||
using `simulateRetryCommand` which replicates the exact struct construction in
|
||||
`queries.RetryCommand`.
|
||||
- **Category**: PASS-NOW / FAIL-AFTER-FIX
|
||||
- **Why it currently passes**: `simulateRetryCommand` exactly mirrors the current production
|
||||
code (commands.go:202) — no signing call.
|
||||
- **What changes after fix**: `simulateRetryCommand` must be updated to include the signing
|
||||
call, or the test must be rewritten against the fixed implementation.
|
||||
|
||||
#### `TestRetryCommandEndpointMustProduceSignedCommand`
|
||||
- **Bug**: F-5
|
||||
- **What it asserts**: `retried.Signature != ""`, `retried.SignedAt != nil`, `retried.KeyID != ""`
|
||||
- **Category**: FAIL-NOW / PASS-AFTER-FIX
|
||||
- **Why it currently fails**: `simulateRetryCommand` produces an unsigned command (bug exists).
|
||||
- **What changes after fix**: The production code will produce a signed command; update
|
||||
`simulateRetryCommand` to call the signing service and the assertions will pass.
|
||||
|
||||
#### `TestRetryCommandHTTPHandlerProducesUnsignedCommand_Integration`
|
||||
- **Bug**: F-5
|
||||
- **Status**: Skipped — requires live DB or interface extraction (see TODO in file).
|
||||
- **How to enable**: Extract `CommandQueriesInterface` from `CommandQueries` and update
|
||||
handlers to accept the interface, then replace `simulateRetryCommand` with a real
|
||||
handler invocation via `httptest`.
|
||||
|
||||
---
|
||||
|
||||
## State-Change Summary
|
||||
|
||||
| Test | Current State | After A-2 Fix |
|
||||
|------|--------------|---------------|
|
||||
| TestRetryCommandIsUnsigned | PASS | FAIL (flip expected) |
|
||||
| TestRetryCommandMustBeSigned | **FAIL** | PASS |
|
||||
| TestSignedCommandNotBoundToAgent | PASS | FAIL (flip expected) |
|
||||
| TestOldFormatCommandHasNoExpiry | PASS | FAIL (flip expected) |
|
||||
| TestOldFormatReplayIsUnbounded | PASS | FAIL (flip expected) |
|
||||
| TestNewFormatCommandCanBeReplayedWithin24Hours | PASS | PASS (or FAIL if maxAge reduced) |
|
||||
| TestSameCommandCanBeVerifiedTwice | PASS | FAIL (flip expected) |
|
||||
| TestCrossAgentSignatureVerifies | PASS | FAIL (flip expected) |
|
||||
| TestGetPendingCommandsHasNoTTLFilter | PASS | FAIL (flip expected) |
|
||||
| TestGetPendingCommandsMustHaveTTLFilter | **FAIL** | PASS |
|
||||
| TestRetryCommandQueryDoesNotCopySignature | PASS | documentary (update manually) |
|
||||
| TestRetryCommandEndpointProducesUnsignedCommand | PASS | FAIL (flip expected) |
|
||||
| TestRetryCommandEndpointMustProduceSignedCommand | **FAIL** | PASS |
|
||||
|
||||
Tests in **bold** currently FAIL — these are the "tests written to fail with current code"
|
||||
that satisfy the TDD requirement directly. All other tests currently PASS, documenting
|
||||
the bug-as-behavior, and will flip to FAIL when the fix changes the behavior they assert.
|
||||
|
||||
---
|
||||
|
||||
## Maintenance Notes
|
||||
|
||||
1. **When applying the fix for F-5**: Update `simulateRetryCommand` in
|
||||
`retry_signing_test.go` to reflect the new signed-command production. Update the
|
||||
assertions in `TestRetryCommandIsUnsigned` and `TestRetryCommandEndpointProducesUnsignedCommand`
|
||||
to assert the correct post-fix state.
|
||||
|
||||
2. **When applying the fix for F-6/F-7**: Update `getPendingCommandsQuery` in
|
||||
`commands_ttl_test.go` to the new query text. Invert the assertions in
|
||||
`TestGetPendingCommandsHasNoTTLFilter` to assert presence (not absence) of TTL.
|
||||
|
||||
3. **When applying the fix for F-3**: Update `TestOldFormatCommandHasNoExpiry` and
|
||||
`TestOldFormatReplayIsUnbounded` to assert that old-format commands ARE rejected,
|
||||
or that the backward-compat path has a defined expiry.
|
||||
|
||||
4. **When applying the fix for F-1**: Update `TestSignedCommandNotBoundToAgent` and
|
||||
`TestCrossAgentSignatureVerifies` to pass an `agent_id` into the signed message and
|
||||
assert that a cross-agent replay fails verification.
|
||||
|
||||
5. **When applying the fix for F-2**: Update `TestSameCommandCanBeVerifiedTwice` to
|
||||
assert that the second call returns an error (deduplication firing).
|
||||
|
||||
---
|
||||
|
||||
## Post-Fix Status (2026-03-28)
|
||||
|
||||
All fixes have been applied. Test status:
|
||||
|
||||
| Test | Pre-Fix | Post-Fix | Status |
|
||||
|------|---------|----------|--------|
|
||||
| TestRetryCommandIsUnsigned | PASS | UPDATED — now asserts signed | VERIFIED PASSING |
|
||||
| TestRetryCommandMustBeSigned | FAIL | UPDATED — now passes | VERIFIED PASSING |
|
||||
| TestSignedCommandNotBoundToAgent | PASS | UPDATED — asserts agent_id binding | VERIFIED PASSING |
|
||||
| TestOldFormatCommandHasNoExpiry | PASS | UPDATED — documents crypto vs app-layer | VERIFIED PASSING |
|
||||
| TestOldFormatReplayIsUnbounded | PASS | UPDATED — asserts 48h rejection | VERIFIED PASSING |
|
||||
| TestOldFormatRecentCommandStillPasses | N/A | NEW — backward compat for recent old-format | VERIFIED PASSING |
|
||||
| TestNewFormatCommandCanBeReplayedWithin24Hours | PASS | UPDATED — uses 4h window (3h59m) | VERIFIED PASSING |
|
||||
| TestCommandBeyond4HoursIsRejected | N/A | NEW — asserts 4h rejection | VERIFIED PASSING |
|
||||
| TestSameCommandCanBeVerifiedTwice | PASS | UPDATED — documents verifier purity, dedup at ProcessCommand | VERIFIED PASSING |
|
||||
| TestCrossAgentSignatureVerifies | PASS | UPDATED — asserts cross-agent failure | VERIFIED PASSING |
|
||||
| TestGetPendingCommandsHasNoTTLFilter | PASS | UPDATED — asserts TTL presence | VERIFIED PASSING |
|
||||
| TestGetPendingCommandsMustHaveTTLFilter | FAIL | UPDATED — now passes | VERIFIED PASSING |
|
||||
| TestRetryCommandQueryDoesNotCopySignature | PASS | Unchanged (documentary) | VERIFIED PASSING |
|
||||
| TestRetryCommandEndpointProducesUnsignedCommand | PASS | UPDATED — asserts signed | VERIFIED PASSING |
|
||||
| TestRetryCommandEndpointMustProduceSignedCommand | FAIL | UPDATED — now passes | VERIFIED PASSING |
|
||||
Reference in New Issue
Block a user