17 KiB
RedFlag Security Architecture Audit
Date: 2025-01-07 Version: 0.1.23 Status: 🔴 Security Claims Not Fully Implemented
Executive Summary
RedFlag claims to implement a comprehensive security architecture including:
- Ed25519 digital signatures for agent updates
- Nonce-based replay protection
- Machine ID binding (anti-impersonation)
- Trust-On-First-Use (TOFU) public key distribution
- Command acknowledgment system
Finding: The security infrastructure code exists and is well-designed, but the update signing workflow is not operational. Zero signed update packages exist in the database, meaning agent updates cannot currently be verified.
Security Components - Detailed Analysis
1. Ed25519 Digital Signatures
✅ What's Implemented (Code Level)
Server Side:
-
aggregator-server/internal/services/signing.go:45-66-SignFile()function- Reads binary file
- Computes SHA-256 checksum
- Signs with Ed25519 private key
- Returns signature + checksum
-
aggregator-server/internal/api/handlers/agent_updates.go:320-363-SignUpdatePackage()endpoint- Receives:
{version, platform, architecture, binary_path} - Calls
SignFile() - Stores in
agent_update_packagestable
- Receives:
Agent Side:
aggregator-agent/cmd/agent/subsystem_handlers.go:782-813-verifyBinarySignature()function- Loads cached server public key
- Reads binary file
- Verifies Ed25519 signature
- Returns error if invalid
Update Handler:
aggregator-agent/cmd/agent/subsystem_handlers.go:346-495-handleUpdateAgent()- Validates nonce (line 397)
- Downloads binary (line 436)
- Verifies checksum (line 449)
- Verifies Ed25519 signature (line 456)
- Installs with atomic backup/rollback
❌ What's Missing (Workflow Level)
-
No Signed Packages in Database:
SELECT COUNT(*) FROM agent_update_packages; -- Result: 0 -
No Signing Automation:
- Agent binaries are built during
docker compose build(Dockerfile:19-28) - Binaries exist at
/app/binaries/{platform}/redflag-agent(10.8MB each) - But they are never signed and inserted into the database
- Agent binaries are built during
-
No UI for Signing:
- Setup wizard generates Ed25519 keypair ✅
- No interface to sign binaries ❌
- No interface to view signed packages ❌
- No interface to manage package versions ❌
-
Update Flow Fails:
Admin clicks "Update Agent" → POST /agents/:id/update → GetUpdatePackageByVersion(version, platform, arch) → Returns 404: "update package not found" → Update never starts
🔍 Manual Verification
To verify signing works, an admin would need to:
# 1. Get auth token
TOKEN=$(curl -X POST http://localhost:8080/api/v1/auth/login \
-d '{"username":"admin","password":"<password>"}' | jq -r .token)
# 2. Sign the binary
curl -X POST http://localhost:8080/api/v1/updates/packages/sign \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"version": "0.1.23",
"platform": "linux",
"architecture": "amd64",
"binary_path": "/app/binaries/linux-amd64/redflag-agent"
}'
# 3. Verify in database
docker exec redflag-postgres psql -U redflag -d redflag \
-c "SELECT version, platform, left(signature, 16) FROM agent_update_packages;"
Current Status: No documentation exists for this workflow.
2. Nonce-Based Replay Protection
✅ What's Implemented
Server Side:
aggregator-server/internal/api/handlers/agent_updates.go:86-99nonceUUID := uuid.New() nonceTimestamp := time.Now() nonceSignature, err = h.signingService.SignNonce(nonceUUID, nonceTimestamp)- Generates UUID + timestamp
- Signs with Ed25519 private key
- Includes in command parameters
Agent Side:
aggregator-agent/cmd/agent/subsystem_handlers.go:848-893-validateNonce()- Parses timestamp (line 851)
- Checks age < 5 minutes (line 857-860)
- Verifies Ed25519 signature against cached public key (line 887)
- Rejects expired or invalid nonces
Configuration:
- Configurable via
REDFLAG_NONCE_MAX_AGE_MINUTES(default: 5 minutes)
✅ Status: FULLY OPERATIONAL
- Nonces are generated for every update command
- Validation happens before download starts
- Prevents replay attacks
3. Machine ID Binding
✅ What's Implemented
Server Side:
aggregator-server/internal/api/middleware/machine_binding.go:13-99- Applied to all
/agents/*endpoints (main.go:251) - Validates
X-Machine-IDheader (line 58) - Compares with database
machine_idcolumn (line 82) - Returns HTTP 403 on mismatch (line 85-90)
- Enforces minimum agent version 0.1.22+ (line 42-54)
- Applied to all
Agent Side:
aggregator-agent/internal/system/machine_id.go-GetMachineID()- Linux: Uses
/etc/machine-idor/var/lib/dbus/machine-id - Windows: Uses registry
HKLM\SOFTWARE\Microsoft\Cryptography\MachineGuid - Cached in agent state
- Sent in
X-Machine-IDheader on every request
- Linux: Uses
Database:
agents.machine_idcolumn (VARCHAR(255), added in migration 016)- Stored during registration
- Validated on every check-in
✅ Status: FULLY OPERATIONAL
- Machine binding prevents config file copying to different machines
- Logs security alerts:
⚠️ SECURITY ALERT: Agent ... machine ID mismatch!
⚠️ Known Issues:
- No UI visibility: Admins can't see machine ID in dashboard
- No recovery workflow: If machine ID changes (hardware swap), agent must re-register
4. Trust-On-First-Use (TOFU) Public Key
✅ What's Implemented
Server Endpoint:
aggregator-server/internal/api/handlers/system.go:22-32-GetPublicKey()- Returns Ed25519 public key in hex format
- Available at
GET /api/v1/public-key - Rate limited (public_access tier)
Agent Fetching:
aggregator-agent/cmd/agent/main.go:465-473log.Println("Fetching server public key...") if err := fetchAndCachePublicKey(cfg.ServerURL); err != nil { log.Printf("Warning: Failed to fetch server public key: %v", err) // Don't fail registration - key can be fetched later }- Fetches during registration (line 467)
- Caches to
/etc/redflag/server_public_key(Linux) orC:\ProgramData\RedFlag\server_public_key(Windows) - Used for all signature verification
Agent Usage:
aggregator-agent/cmd/agent/subsystem_handlers.go:815-846-getServerPublicKey()- Loads from cache
- Used by
verifyBinarySignature()(line 784) - Used by
validateNonce()(line 867)
⚠️ What's Broken
1. Non-Blocking Fetch (Critical):
main.go:468-470: If public key fetch fails, agent registers anyway- Agent cannot verify updates without public key
- All update commands will fail signature verification
- No retry mechanism
2. No Fingerprint Logging:
- Agent doesn't log the server's public key fingerprint during TOFU
- Admins have no way to verify correct server was contacted
- Silent MITM vulnerability if wrong server URL provided
3. No Key Rotation Support:
- Cached public key never expires
- No mechanism to update if server rotates keys
- Agent would need manual
/etc/redflag/server_public_keydeletion
5. Command Acknowledgment System
✅ What's Implemented
Agent Side:
aggregator-agent/internal/acknowledgment/tracker.go- Acknowledgment tracker- Stores pending command results in
pending_acks.json - Tracks retry count (max 10 retries)
- Expires after 24 hours
- Sends acknowledgments in every check-in
- Stores pending command results in
Server Side:
aggregator-server/internal/database/queries/commands.go-VerifyCommandsCompleted()- Returns
AcknowledgedIDsin check-in response - Agent removes acknowledged commands from pending list
- Returns
Agent Main Loop:
aggregator-agent/cmd/agent/main.go:834-843if response != nil && len(response.AcknowledgedIDs) > 0 { ackTracker.Acknowledge(response.AcknowledgedIDs) log.Printf("Server acknowledged %d command result(s)", len(response.AcknowledgedIDs)) }
✅ Status: FULLY OPERATIONAL
- At-least-once delivery guarantee
- Automatic retry on network failures
- Cleanup after success or expiration
Critical Security Issues
Issue #1: Hardcoded Signing Key (High Severity)
Location: config/.env:24
REDFLAG_SIGNING_PRIVATE_KEY=1104a7fd7fb1a12b99e31d043fc7f4ef00bee6df19daff11ae4244606dac5bf9792d68d1c31f6c6a7820033720fb80d54bf22a8aab0382efd5deacc5122a5947
Public Key Fingerprint: 792d68d1c31f6c6a
Problem:
- Same signing key appears across multiple test server instances
.envfile is gitignored ✅ but manually copied between servers ❌- Setup wizard generates NEW keys, but if
.envalready hasREDFLAG_SIGNING_PRIVATE_KEY, it's reused
Impact:
- If one server is compromised, attacker can sign updates for ALL servers using this key
- No uniqueness validation on server startup
Reproduction:
# Server A
grep REDFLAG_SIGNING_PRIVATE_KEY config/.env | sha256sum
# Output: abc123...
# Server B
grep REDFLAG_SIGNING_PRIVATE_KEY config/.env | sha256sum
# Output: abc123... ← SAME KEY
Remediation:
- Delete signing key from all
.envfiles - Run setup wizard on each server to generate unique keys
- Add startup validation to warn if key fingerprint matches known test keys
- Document key generation in deployment guide
Issue #2: Update Signing Workflow Not Operational (Critical)
Problem:
- Zero signed packages in database
- No automation to sign binaries after build
- No UI to trigger signing
- Update commands fail with 404
Evidence:
redflag=# SELECT COUNT(*) FROM agent_update_packages;
count
-------
0
Impact:
- Agent updates are completely non-functional
- Security claims in documentation are misleading
- Admin has no way to push signed updates
Required to Fix:
-
Signing Automation:
- Add post-build hook to sign binaries
- Store in database automatically
- Version management (which version is "latest"?)
-
Admin UI:
- Settings page: "Manage Update Packages"
- List signed packages with versions
- Button: "Sign Current Binaries"
- Show fingerprint of signing key in use
-
API Endpoints:
GET /api/v1/updates/packages- List signed packagesPOST /api/v1/updates/packages/sign-all- Sign all binaries in/app/binaries/DELETE /api/v1/updates/packages/:id- Deactivate old package
-
Docker Build Integration:
# After building binaries, sign them RUN go run scripts/sign-binaries.go \ --private-key=$REDFLAG_SIGNING_PRIVATE_KEY \ --binaries=/app/binaries
Issue #3: Public Key Fetch Non-Blocking (Medium Severity)
Location: aggregator-agent/cmd/agent/main.go:468-470
Problem:
if err := fetchAndCachePublicKey(cfg.ServerURL); err != nil {
log.Printf("Warning: Failed to fetch server public key: %v", err)
// Don't fail registration - key can be fetched later ← PROBLEM
}
Impact:
- Agent registers successfully without public key
- Receives update commands
- All updates fail signature verification
- No automatic retry to fetch key
Remediation:
// Block update commands if no public key cached
func handleUpdateAgent(...) error {
publicKey, err := getServerPublicKey()
if err != nil {
return fmt.Errorf("cannot process updates - server public key not cached: %w", err)
}
// ... proceed with update
}
Issue #4: No Fingerprint Verification (Medium Severity)
Problem:
- Agent performs TOFU but doesn't log server's public key fingerprint
- Admin has no visibility into which server the agent trusts
- If wrong server URL provided, agent silently trusts wrong server
Remediation:
// After fetching public key
publicKey, err := crypto.FetchAndCacheServerPublicKey(serverURL)
if err != nil {
return err
}
fingerprint := hex.EncodeToString(publicKey[:8])
log.Printf("✅ Server public key cached successfully")
log.Printf("📌 Server fingerprint: %s", fingerprint)
log.Printf("⚠️ Verify this fingerprint matches your server's expected value")
Issue #5: No Signing Service = Silent Failure (Low Severity)
Location: aggregator-server/internal/api/handlers/agent_updates.go:90-99
Problem:
if h.signingService != nil {
nonceSignature, err = h.signingService.SignNonce(...)
}
// Falls through - creates command with EMPTY signature
Impact:
- If
REDFLAG_SIGNING_PRIVATE_KEYnot set, server still sends update commands - Commands have empty
nonce_signaturefield - Agent correctly rejects them
- But admin has no visibility into why updates are failing
Remediation:
// Block update endpoints entirely if no signing service
if h.signingService == nil {
c.JSON(http.StatusServiceUnavailable, gin.H{
"error": "Agent updates are disabled - no signing key configured",
"hint": "Generate Ed25519 keys in Settings → Security",
})
return
}
What Actually Works
✅ Components That Are Operational
- Machine ID Binding: Fully functional, prevents config copying
- Nonce Replay Protection: Fully functional, prevents command replay
- Command Acknowledgment: Fully functional, reliable delivery
- Ed25519 Signing (Code): Implementation is correct, just not wired up
- Setup Wizard Key Generation: Works, generates unique Ed25519 keypairs
❌ Components That Are Broken
- Agent Update Signing: No packages in database, updates fail
- TOFU Failure Handling: Non-blocking, no retry
- Fingerprint Verification: Agent doesn't log server fingerprint
- Key Uniqueness: No validation against key reuse
Security Posture Assessment
Current State: 🔴 Not Production Ready
Strengths:
- Well-designed architecture
- Strong cryptographic primitives (Ed25519)
- Defense-in-depth approach
- Good separation of concerns
Weaknesses:
- Critical: Agent updates completely non-functional
- Critical: Signing key reuse across test instances
- High: No UI/automation for signing workflow
- Medium: Public key fetch can fail silently
- Medium: No fingerprint verification for admins
Risk Analysis
If deployed to production:
| Risk | Likelihood | Impact | Severity |
|---|---|---|---|
| Cannot push agent updates | 100% | High | Critical |
| Signing key compromise affects all servers | Medium | Critical | High |
| Agent trusts wrong server (wrong URL) | Low | High | Medium |
| Agent registers without public key | Low | Medium | Low |
Recommended Actions
Before claiming security features:
- Complete update signing workflow (UI + automation)
- Test end-to-end agent update with signature verification
- Add fingerprint logging and verification
- Document key generation and unique-per-server requirements
- Add integration tests for signing workflow
Immediate fixes (can be done now):
- Block update commands if no public key cached
- Block update endpoints if no signing service configured
- Log server fingerprint during TOFU
- Add warning on server startup if signing key missing
Documentation Gaps
Missing Documentation
-
Agent Update Workflow:
- How to sign binaries
- How to push updates to agents
- How to verify signatures manually
- Rollback procedures
-
Key Management:
- How to generate unique keys per server
- How to rotate keys safely
- How to verify key uniqueness
- Backup/recovery procedures
-
Security Model:
- TOFU trust model explanation
- Attack scenarios and mitigations
- Threat model documentation
- Security assumptions
-
Operational Procedures:
- Agent registration verification
- Machine ID troubleshooting
- Signature verification debugging
- Security incident response
Conclusion
RedFlag has excellent security infrastructure code, but the operational workflow is incomplete. The signing system exists but is not connected to the update delivery system. This makes it impossible to push signed updates to agents, rendering the security architecture non-functional.
Key Findings:
- ✅ All security primitives are correctly implemented
- ✅ Code quality is high, cryptography is sound
- ❌ No signed packages exist in database
- ❌ No UI or automation for signing workflow
- ❌ Agent updates are currently broken
Recommendation: Either complete the signing workflow implementation or remove security claims from documentation until operational.
Next Steps
Option 1: Complete Implementation
- Add signing automation (post-build hook)
- Build admin UI for package management
- Add integration tests
- Document operational procedures
- Estimated effort: 2-3 days
Option 2: Document As-Is
- Update README to clarify "security infrastructure in progress"
- Document manual signing procedure
- Add warning that updates require manual intervention
- Estimated effort: 2 hours
Option 3: Temporary Workaround
- Add script to sign all binaries on container startup
- Populate database automatically
- Document as "alpha security model"
- Estimated effort: 4 hours