Files

17 KiB

RedFlag Security Architecture Audit

Date: 2025-01-07 Version: 0.1.23 Status: 🔴 Security Claims Not Fully Implemented


Executive Summary

RedFlag claims to implement a comprehensive security architecture including:

  • Ed25519 digital signatures for agent updates
  • Nonce-based replay protection
  • Machine ID binding (anti-impersonation)
  • Trust-On-First-Use (TOFU) public key distribution
  • Command acknowledgment system

Finding: The security infrastructure code exists and is well-designed, but the update signing workflow is not operational. Zero signed update packages exist in the database, meaning agent updates cannot currently be verified.


Security Components - Detailed Analysis

1. Ed25519 Digital Signatures

What's Implemented (Code Level)

Server Side:

  • aggregator-server/internal/services/signing.go:45-66 - SignFile() function

    • Reads binary file
    • Computes SHA-256 checksum
    • Signs with Ed25519 private key
    • Returns signature + checksum
  • aggregator-server/internal/api/handlers/agent_updates.go:320-363 - SignUpdatePackage() endpoint

    • Receives: {version, platform, architecture, binary_path}
    • Calls SignFile()
    • Stores in agent_update_packages table

Agent Side:

  • aggregator-agent/cmd/agent/subsystem_handlers.go:782-813 - verifyBinarySignature() function
    • Loads cached server public key
    • Reads binary file
    • Verifies Ed25519 signature
    • Returns error if invalid

Update Handler:

  • aggregator-agent/cmd/agent/subsystem_handlers.go:346-495 - handleUpdateAgent()
    • Validates nonce (line 397)
    • Downloads binary (line 436)
    • Verifies checksum (line 449)
    • Verifies Ed25519 signature (line 456)
    • Installs with atomic backup/rollback

What's Missing (Workflow Level)

  1. No Signed Packages in Database:

    SELECT COUNT(*) FROM agent_update_packages;
    -- Result: 0
    
  2. No Signing Automation:

    • Agent binaries are built during docker compose build (Dockerfile:19-28)
    • Binaries exist at /app/binaries/{platform}/redflag-agent (10.8MB each)
    • But they are never signed and inserted into the database
  3. No UI for Signing:

    • Setup wizard generates Ed25519 keypair
    • No interface to sign binaries
    • No interface to view signed packages
    • No interface to manage package versions
  4. Update Flow Fails:

    Admin clicks "Update Agent"
      → POST /agents/:id/update
      → GetUpdatePackageByVersion(version, platform, arch)
      → Returns 404: "update package not found"
      → Update never starts
    

🔍 Manual Verification

To verify signing works, an admin would need to:

# 1. Get auth token
TOKEN=$(curl -X POST http://localhost:8080/api/v1/auth/login \
  -d '{"username":"admin","password":"<password>"}' | jq -r .token)

# 2. Sign the binary
curl -X POST http://localhost:8080/api/v1/updates/packages/sign \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "version": "0.1.23",
    "platform": "linux",
    "architecture": "amd64",
    "binary_path": "/app/binaries/linux-amd64/redflag-agent"
  }'

# 3. Verify in database
docker exec redflag-postgres psql -U redflag -d redflag \
  -c "SELECT version, platform, left(signature, 16) FROM agent_update_packages;"

Current Status: No documentation exists for this workflow.


2. Nonce-Based Replay Protection

What's Implemented

Server Side:

  • aggregator-server/internal/api/handlers/agent_updates.go:86-99
    nonceUUID := uuid.New()
    nonceTimestamp := time.Now()
    nonceSignature, err = h.signingService.SignNonce(nonceUUID, nonceTimestamp)
    
    • Generates UUID + timestamp
    • Signs with Ed25519 private key
    • Includes in command parameters

Agent Side:

  • aggregator-agent/cmd/agent/subsystem_handlers.go:848-893 - validateNonce()
    • Parses timestamp (line 851)
    • Checks age < 5 minutes (line 857-860)
    • Verifies Ed25519 signature against cached public key (line 887)
    • Rejects expired or invalid nonces

Configuration:

  • Configurable via REDFLAG_NONCE_MAX_AGE_MINUTES (default: 5 minutes)

Status: FULLY OPERATIONAL

  • Nonces are generated for every update command
  • Validation happens before download starts
  • Prevents replay attacks

3. Machine ID Binding

What's Implemented

Server Side:

  • aggregator-server/internal/api/middleware/machine_binding.go:13-99
    • Applied to all /agents/* endpoints (main.go:251)
    • Validates X-Machine-ID header (line 58)
    • Compares with database machine_id column (line 82)
    • Returns HTTP 403 on mismatch (line 85-90)
    • Enforces minimum agent version 0.1.22+ (line 42-54)

Agent Side:

  • aggregator-agent/internal/system/machine_id.go - GetMachineID()
    • Linux: Uses /etc/machine-id or /var/lib/dbus/machine-id
    • Windows: Uses registry HKLM\SOFTWARE\Microsoft\Cryptography\MachineGuid
    • Cached in agent state
    • Sent in X-Machine-ID header on every request

Database:

  • agents.machine_id column (VARCHAR(255), added in migration 016)
  • Stored during registration
  • Validated on every check-in

Status: FULLY OPERATIONAL

  • Machine binding prevents config file copying to different machines
  • Logs security alerts: ⚠️ SECURITY ALERT: Agent ... machine ID mismatch!

⚠️ Known Issues:

  • No UI visibility: Admins can't see machine ID in dashboard
  • No recovery workflow: If machine ID changes (hardware swap), agent must re-register

4. Trust-On-First-Use (TOFU) Public Key

What's Implemented

Server Endpoint:

  • aggregator-server/internal/api/handlers/system.go:22-32 - GetPublicKey()
    • Returns Ed25519 public key in hex format
    • Available at GET /api/v1/public-key
    • Rate limited (public_access tier)

Agent Fetching:

  • aggregator-agent/cmd/agent/main.go:465-473
    log.Println("Fetching server public key...")
    if err := fetchAndCachePublicKey(cfg.ServerURL); err != nil {
        log.Printf("Warning: Failed to fetch server public key: %v", err)
        // Don't fail registration - key can be fetched later
    }
    
    • Fetches during registration (line 467)
    • Caches to /etc/redflag/server_public_key (Linux) or C:\ProgramData\RedFlag\server_public_key (Windows)
    • Used for all signature verification

Agent Usage:

  • aggregator-agent/cmd/agent/subsystem_handlers.go:815-846 - getServerPublicKey()
    • Loads from cache
    • Used by verifyBinarySignature() (line 784)
    • Used by validateNonce() (line 867)

⚠️ What's Broken

1. Non-Blocking Fetch (Critical):

  • main.go:468-470: If public key fetch fails, agent registers anyway
  • Agent cannot verify updates without public key
  • All update commands will fail signature verification
  • No retry mechanism

2. No Fingerprint Logging:

  • Agent doesn't log the server's public key fingerprint during TOFU
  • Admins have no way to verify correct server was contacted
  • Silent MITM vulnerability if wrong server URL provided

3. No Key Rotation Support:

  • Cached public key never expires
  • No mechanism to update if server rotates keys
  • Agent would need manual /etc/redflag/server_public_key deletion

5. Command Acknowledgment System

What's Implemented

Agent Side:

  • aggregator-agent/internal/acknowledgment/tracker.go - Acknowledgment tracker
    • Stores pending command results in pending_acks.json
    • Tracks retry count (max 10 retries)
    • Expires after 24 hours
    • Sends acknowledgments in every check-in

Server Side:

  • aggregator-server/internal/database/queries/commands.go - VerifyCommandsCompleted()
    • Returns AcknowledgedIDs in check-in response
    • Agent removes acknowledged commands from pending list

Agent Main Loop:

  • aggregator-agent/cmd/agent/main.go:834-843
    if response != nil && len(response.AcknowledgedIDs) > 0 {
        ackTracker.Acknowledge(response.AcknowledgedIDs)
        log.Printf("Server acknowledged %d command result(s)", len(response.AcknowledgedIDs))
    }
    

Status: FULLY OPERATIONAL

  • At-least-once delivery guarantee
  • Automatic retry on network failures
  • Cleanup after success or expiration

Critical Security Issues

Issue #1: Hardcoded Signing Key (High Severity)

Location: config/.env:24

REDFLAG_SIGNING_PRIVATE_KEY=1104a7fd7fb1a12b99e31d043fc7f4ef00bee6df19daff11ae4244606dac5bf9792d68d1c31f6c6a7820033720fb80d54bf22a8aab0382efd5deacc5122a5947

Public Key Fingerprint: 792d68d1c31f6c6a

Problem:

  • Same signing key appears across multiple test server instances
  • .env file is gitignored but manually copied between servers
  • Setup wizard generates NEW keys, but if .env already has REDFLAG_SIGNING_PRIVATE_KEY, it's reused

Impact:

  • If one server is compromised, attacker can sign updates for ALL servers using this key
  • No uniqueness validation on server startup

Reproduction:

# Server A
grep REDFLAG_SIGNING_PRIVATE_KEY config/.env | sha256sum
# Output: abc123...

# Server B
grep REDFLAG_SIGNING_PRIVATE_KEY config/.env | sha256sum
# Output: abc123...  ← SAME KEY

Remediation:

  1. Delete signing key from all .env files
  2. Run setup wizard on each server to generate unique keys
  3. Add startup validation to warn if key fingerprint matches known test keys
  4. Document key generation in deployment guide

Issue #2: Update Signing Workflow Not Operational (Critical)

Problem:

  • Zero signed packages in database
  • No automation to sign binaries after build
  • No UI to trigger signing
  • Update commands fail with 404

Evidence:

redflag=# SELECT COUNT(*) FROM agent_update_packages;
 count
-------
     0

Impact:

  • Agent updates are completely non-functional
  • Security claims in documentation are misleading
  • Admin has no way to push signed updates

Required to Fix:

  1. Signing Automation:

    • Add post-build hook to sign binaries
    • Store in database automatically
    • Version management (which version is "latest"?)
  2. Admin UI:

    • Settings page: "Manage Update Packages"
    • List signed packages with versions
    • Button: "Sign Current Binaries"
    • Show fingerprint of signing key in use
  3. API Endpoints:

    • GET /api/v1/updates/packages - List signed packages
    • POST /api/v1/updates/packages/sign-all - Sign all binaries in /app/binaries/
    • DELETE /api/v1/updates/packages/:id - Deactivate old package
  4. Docker Build Integration:

    # After building binaries, sign them
    RUN go run scripts/sign-binaries.go \
      --private-key=$REDFLAG_SIGNING_PRIVATE_KEY \
      --binaries=/app/binaries
    

Issue #3: Public Key Fetch Non-Blocking (Medium Severity)

Location: aggregator-agent/cmd/agent/main.go:468-470

Problem:

if err := fetchAndCachePublicKey(cfg.ServerURL); err != nil {
    log.Printf("Warning: Failed to fetch server public key: %v", err)
    // Don't fail registration - key can be fetched later  ← PROBLEM
}

Impact:

  • Agent registers successfully without public key
  • Receives update commands
  • All updates fail signature verification
  • No automatic retry to fetch key

Remediation:

// Block update commands if no public key cached
func handleUpdateAgent(...) error {
    publicKey, err := getServerPublicKey()
    if err != nil {
        return fmt.Errorf("cannot process updates - server public key not cached: %w", err)
    }
    // ... proceed with update
}

Issue #4: No Fingerprint Verification (Medium Severity)

Problem:

  • Agent performs TOFU but doesn't log server's public key fingerprint
  • Admin has no visibility into which server the agent trusts
  • If wrong server URL provided, agent silently trusts wrong server

Remediation:

// After fetching public key
publicKey, err := crypto.FetchAndCacheServerPublicKey(serverURL)
if err != nil {
    return err
}

fingerprint := hex.EncodeToString(publicKey[:8])
log.Printf("✅ Server public key cached successfully")
log.Printf("📌 Server fingerprint: %s", fingerprint)
log.Printf("⚠️  Verify this fingerprint matches your server's expected value")

Issue #5: No Signing Service = Silent Failure (Low Severity)

Location: aggregator-server/internal/api/handlers/agent_updates.go:90-99

Problem:

if h.signingService != nil {
    nonceSignature, err = h.signingService.SignNonce(...)
}
// Falls through - creates command with EMPTY signature

Impact:

  • If REDFLAG_SIGNING_PRIVATE_KEY not set, server still sends update commands
  • Commands have empty nonce_signature field
  • Agent correctly rejects them
  • But admin has no visibility into why updates are failing

Remediation:

// Block update endpoints entirely if no signing service
if h.signingService == nil {
    c.JSON(http.StatusServiceUnavailable, gin.H{
        "error": "Agent updates are disabled - no signing key configured",
        "hint": "Generate Ed25519 keys in Settings → Security",
    })
    return
}

What Actually Works

Components That Are Operational

  1. Machine ID Binding: Fully functional, prevents config copying
  2. Nonce Replay Protection: Fully functional, prevents command replay
  3. Command Acknowledgment: Fully functional, reliable delivery
  4. Ed25519 Signing (Code): Implementation is correct, just not wired up
  5. Setup Wizard Key Generation: Works, generates unique Ed25519 keypairs

Components That Are Broken

  1. Agent Update Signing: No packages in database, updates fail
  2. TOFU Failure Handling: Non-blocking, no retry
  3. Fingerprint Verification: Agent doesn't log server fingerprint
  4. Key Uniqueness: No validation against key reuse

Security Posture Assessment

Current State: 🔴 Not Production Ready

Strengths:

  • Well-designed architecture
  • Strong cryptographic primitives (Ed25519)
  • Defense-in-depth approach
  • Good separation of concerns

Weaknesses:

  • Critical: Agent updates completely non-functional
  • Critical: Signing key reuse across test instances
  • High: No UI/automation for signing workflow
  • Medium: Public key fetch can fail silently
  • Medium: No fingerprint verification for admins

Risk Analysis

If deployed to production:

Risk Likelihood Impact Severity
Cannot push agent updates 100% High Critical
Signing key compromise affects all servers Medium Critical High
Agent trusts wrong server (wrong URL) Low High Medium
Agent registers without public key Low Medium Low

Before claiming security features:

  1. Complete update signing workflow (UI + automation)
  2. Test end-to-end agent update with signature verification
  3. Add fingerprint logging and verification
  4. Document key generation and unique-per-server requirements
  5. Add integration tests for signing workflow

Immediate fixes (can be done now):

  1. Block update commands if no public key cached
  2. Block update endpoints if no signing service configured
  3. Log server fingerprint during TOFU
  4. Add warning on server startup if signing key missing

Documentation Gaps

Missing Documentation

  1. Agent Update Workflow:

    • How to sign binaries
    • How to push updates to agents
    • How to verify signatures manually
    • Rollback procedures
  2. Key Management:

    • How to generate unique keys per server
    • How to rotate keys safely
    • How to verify key uniqueness
    • Backup/recovery procedures
  3. Security Model:

    • TOFU trust model explanation
    • Attack scenarios and mitigations
    • Threat model documentation
    • Security assumptions
  4. Operational Procedures:

    • Agent registration verification
    • Machine ID troubleshooting
    • Signature verification debugging
    • Security incident response

Conclusion

RedFlag has excellent security infrastructure code, but the operational workflow is incomplete. The signing system exists but is not connected to the update delivery system. This makes it impossible to push signed updates to agents, rendering the security architecture non-functional.

Key Findings:

  • All security primitives are correctly implemented
  • Code quality is high, cryptography is sound
  • No signed packages exist in database
  • No UI or automation for signing workflow
  • Agent updates are currently broken

Recommendation: Either complete the signing workflow implementation or remove security claims from documentation until operational.


Next Steps

Option 1: Complete Implementation

  • Add signing automation (post-build hook)
  • Build admin UI for package management
  • Add integration tests
  • Document operational procedures
  • Estimated effort: 2-3 days

Option 2: Document As-Is

  • Update README to clarify "security infrastructure in progress"
  • Document manual signing procedure
  • Add warning that updates require manual intervention
  • Estimated effort: 2 hours

Option 3: Temporary Workaround

  • Add script to sign all binaries on container startup
  • Populate database automatically
  • Document as "alpha security model"
  • Estimated effort: 4 hours