Files

20 KiB
Raw Permalink Blame History

RedFlag Binary Signing Strategy Decision Document

Date: 2025-11-10 Version: 0.1.23.4 Status: Architecture Decision Record (ADR) - In Review


1. Decision Context

1.1 Background

RedFlag implements Ed25519 digital signatures for agent binary integrity verification. The signing infrastructure (signingService.SignFile()) is operational, but the workflow integration is incomplete - the build orchestrator generates Docker deployment configs instead of signed native binaries.

The agent install script expects:

  • Native binaries (Linux: ELF, Windows: PE)
  • Ed25519 signatures for verification
  • Configurable via config.json
  • Deployed via systemd/Windows Service Manager

The current build orchestrator generates:

  • docker-compose.yml (Docker container deployment)
  • Dockerfile (multi-stage build instructions)
  • Embedded Go config (compile-time injection)

1.2 Problem Statement

Critical Gap: When an admin clicks "Update Agent" in the UI, the server looks for signed packages in agent_update_packages table, finds zero packages, and returns 404 Not Found.

Root Cause: The build pipeline produces unsigned generic binaries during Docker multi-stage build, but never:

  1. Signs the binaries with Ed25519 private key
  2. Embeds agent-specific configuration
  3. Stores signed binary metadata in database
  4. Serves signed versions via download endpoint

1.3 Decision Required

Question: How should the build orchestrator generate signed binaries?

Option 1: Per-Agent Signing (unique binary + signature for each agent) Option 2: Per-Version/Platform Signing (one binary + signature per version/platform) Option 3: Hybrid approach (per-version binary with per-agent config obfuscation)


2. Options Analysis

2.1 Option 1: Per-Agent Signing

Implementation

// For each agent:
1. Take generic binary from /app/binaries/{platform}/
2. Embed agent-specific config.json (agent_id, token, server_url)
3. Compile/repackage with embedded config
4. Sign resulting binary with Ed25519 private key
5. Store in database: agent_update_packages_{agent_id}_{version}_{platform}
6. Serve via /api/v1/downloads/{agent_id}/{platform}

Security Properties

Strengths:

  • Single-file deployment (binary includes config)
  • Config protected by binary signature
  • Slightly higher bar for config extraction
  • Per-agent unique artifacts

Weaknesses:

  • ⚠️ Config still extractable (reverse engineering or runtime memory dump)
  • ⚠️ Minimal security gain over Option 2 (see Threat Analysis below)
  • ⚠️ Config obscurity, not encryption

Operational Impact

Storage:

  • 1,000 agents × 11 MB binary = 11 GB storage
  • Each agent requires unique binary copy
  • CDN caching ineffective (unique URLs per agent)

Compute:

  • ~10ms Ed25519 sign operation per agent
  • 1,000 agents = 10 seconds CPU time
  • Serial bottleneck during mass updates
  • Parallel signing possible but adds complexity

Network:

  • Each agent downloads unique binary
  • Cannot share downloads across agents
  • Bandwidth usage scales linearly with agent count

Cache Efficiency:

  • CDN or proxy caching: Poor
  • Each agent has different URL: /downloads/{agent_id}/linux-amd64
  • No shared cache hits

Rollback Complexity:

  • Must track per-agent version in database
  • Cannot roll back all agents simultaneously with single version number
  • Each agent has independent version history

Build Time:

  • Sign each agent individually
  • Cannot pre-sign binaries before agent deployment
  • On-demand signing introduces latency

Use Cases

When this makes sense:

  • Ultra-high security environments with regulatory requirements for config-at-rest encryption
  • Small deployments (<100 agents) where storage is not a concern
  • When config secrecy is paramount and worth the operational overhead

When this is overkill:

  • Standard MSP deployments (100-10,000 agents)
  • When operational simplicity is valued
  • When config is not highly sensitive (already protected by machine binding)

2.2 Option 2: Per-Version/Platform Signing

Implementation

// Once per version/platform:
1. Take generic binary from /app/binaries/{platform}/
2. Sign generic binary with Ed25519 private key
3. Store in database: agent_update_packages_{version}_{platform}
4. Serve via /api/v1/downloads/{platform}

// Per-agent config (separate):
5. Generate config.json (agent_id, token, server_url)
6. Download binary + config.json independently
7. Agent verifies binary signature
8. Agent loads config from file

Security Properties

Strengths:

  • All cryptographic guarantees of Option 1
  • Token lifetime controls (24h JWT, 90d refresh) limit exposure
  • Server-side validation (machine ID binding) prevents misuse
  • Token revocation capability

Addressing Config Protection Concerns:

Q: "But config is plaintext on disk!" A: "Is that actually a problem?"

Current protections:

  1. File permissions: 0600 (owner read/write only)
  2. Machine ID binding: Config only works on one machine
  3. Token lifetimes: 24h JWT, 90d refresh window
  4. Revocation: Tokens can be revoked at any time
  5. Registration tokens: Single-use or multi-seat (limited)

Attack scenarios with Option 2:

Scenario 1: Attacker gains filesystem access to agent machine

Attacker actions:
- Can read /etc/redflag/config.json
- Sees: {"server_url": "...", "agent_id": "...", "token": "..."}

Questions:
Q: Can attacker use this token on another machine?
A: No - Machine ID binding (server validates X-Machine-ID header)

Q: Can attacker register new agent with this token?
A: No - Registration token used once (or multi-seat but tracked)

Q: Can attacker impersonate this agent?
A: Only from the already-compromised machine (attacker already has access)

Q: Is token exposure the biggest concern?
A: No - If attacker has filesystem access, they can execute commands as agent anyway

Scenario 2: Attacker steals disk image (offline attack)

Attacker actions:
- Clones VM disk
- Boots on different hardware
- Tries to use stolen config

Questions:
Q: Will machine ID validation fail?
A: Yes - Different hardware = different machine fingerprint

Q: Can attacker bypass machine ID check?
A: No - It's server-side validation, not client-side

Conclusion: Stolen disk useless without original hardware

Scenario 3: Malicious insider (legitimate access)

Attacker actions:
- Has root access to agent machine
- Can read config files
- Can execute commands as agent user

Questions:
Q: What additional damage can they do with tokens?
A: None - they already have agent-level access

Q: Can tokens be used elsewhere?
A: No - bound to specific machine

Conclusion: Tokens are not the attack vector - compromised machine is

Verdict: Config plaintext storage is NOT a critical vulnerability given existing protections.

Operational Impact

Storage:

  • 3 platforms × 11 MB binary = 33 MB total
  • Example platforms: linux-amd64, linux-arm64, windows-amd64
  • 99.7% storage savings vs Option 1 (1,000 agents)

Compute:

  • One 10ms Ed25519 sign operation per version/platform
  • Signing once during release process
  • Can be pre-signed before deployment

Network:

  • Binary downloaded once per platform
  • CDN or proxy caching: Excellent
  • All agents share same URL: /downloads/linux-amd64
  • Cache hits for subsequent agents

Cache Efficiency:

  • CDN can cache single binary for all agents
  • Corporate proxies cache effectively
  • Bandwidth usage scales sub-linearly

Rollback Complexity:

  • Simple: Change version number in database
  • All agents roll back together
  • Single point of control

Build Time:

  • Sign once during CI/CD pipeline
  • No on-demand signing latency
  • Immediate availability

Agent-Side Simplicity

// Agent update process:
func UpdateAgent() error {
    // 1. Download signed binary
    binary, sig := download("/downloads/linux-amd64")

    // 2. Verify signature
    if !verifyBinarySignature(binary, sig, serverPublicKey) {
        return fmt.Errorf("signature verification failed")
    }

    // 3. Atomically replace binary
    return atomicReplace(binary, "/usr/local/bin/redflag-agent")

    // Note: Config.json remains unchanged (tokens still valid)
}

Benefits:

  • No config rewriting during updates
  • Token persistence across updates
  • Simpler state management

2.3 Option 3: Hybrid (Per-Version Binary + Config Obfuscation)

Implementation

// Combine Option 2 with lightweight config protection:
1. Sign generic binary per version/platform (Option 2)
2. Obfuscate (not encrypt) config.json
3. Use XOR or simple transformation
4. Breaks casual inspection (grep for "token")

// Config on disk:
/etc/redflag/config.dat  // Binary blob, not JSON
// or:
/etc/redflag/config.json  // Obfuscated fields

Security Assessment

Pros:

  • Slightly raises bar for casual inspection
  • Low implementation complexity
  • Fast (no crypto operations)

Cons:

  • Not real security (obfuscation ≠ encryption)
  • Easily reversed (one debugger breakpoint)
  • False sense of security

Recommendation

Skip this option. Either do proper security (Option 2 with kernel keyring for config) or accept that tokens are short-lived and protected by other mechanisms. Obfuscation provides minimal value.


3. Threat Analysis Model

Attack Scenario Matrix

Attack Vector Option 1 (Per-Agent) Option 2 (Per-Version) Mitigation Available?
Token theft from filesystem Config in binary (harder to extract) Config in plaintext file (easier to read) Yes: Machine ID binding prevents cross-machine use
Stolen disk image Machine ID different (fails) Machine ID different (fails) Yes: Server-side validation
Network sniffing HTTPS protects tokens HTTPS protects tokens Yes: TLS encryption in transit
JWT token compromise 24h window 24h window Yes: Short lifetime, refresh token rotation
Refresh token compromise 90d window 90d window Yes: Can revoke, machine binding
Registration token theft Single-use or limited seats Single-use or limited seats Yes: Expiration, seat limits, revocation
Binary tampering Signature verification catches Signature verification catches Yes: Ed25519 verification
Malicious insider Attacker already has access Attacker already has access No: Physical/root access defeats both

Critical Insights

1. Config extraction is not the primary attack vector

  • If attacker has filesystem access, they can execute commands as agent
  • Tokens are secondary concern
  • Machine ID binding prevents cross-machine token reuse

2. Machine ID binding is the real protection

  • Prevents config copying to unauthorized machines
  • Server-side validation (can't bypass)
  • Hardware-rooted (difficult to spoof)

3. Token lifetimes limit damage

  • JWT: 24h max exposure
  • Refresh token: 90d max (revocable)
  • Rotation reduces window further

4. Client-side config protection is marginal

  • Attacker with root access can dump process memory
  • Attacker with physical access can extract keys
  • Obfuscation/encryption only slows down determined attacker

4. Recommendation

Primary Recommendation: Option 2 (Per-Version/Platform Signing)

Rationale:

  1. Security is sufficient

    • Tokens protected by machine ID binding
    • Short lifetimes limit exposure
    • Revocation capability exists
    • Config plaintext is not critical vulnerability
  2. Operational efficiency

    • 99.7% storage savings (33MB vs 11GB for 1,000 agents)
    • Excellent CDN/proxy caching
    • Fast signing (once per version)
    • Simple rollback (single version number)
  3. Scalability

    • Works for 10 agents or 10,000 agents
    • Sub-linear bandwidth usage
    • No per-agent build complexity
  4. Implementation simplicity

    • Agent updates don't rewrite config
    • Token persistence across updates
    • Clear separation of concerns

Secondary Recommendation: Kernel Keyring Config Protection (Future Enhancement)

// For defense in depth, not immediate need:
func LoadConfig() (*Config, error) {
    // Try kernel keyring first
    if keyringConfig, err := loadFromKeyring(); err == nil {
        return keyringConfig, nil
    }

    // Fallback to file
    return loadFromFile("/etc/redflag/config.json")
}

// On token refresh:
func SaveTokens() error {
    // Store encrypted in kernel keyring (Linux)
    // Or Windows Credential Manager (Windows)
    return saveToKeyring(agentID, token, refreshToken)
}

Why this is optional:

  • Tokens already have short lifetimes
  • Machine ID binding prevents misuse
  • File permissions already restrict access
  • Implementation complexity not justified by security gain

When to implement:

  • Regulatory requirement for config-at-rest encryption
  • High-security environment with strict compliance needs
  • After all Tier 1 security gaps are addressed

5. Implementation Plan

Phase 1: Implement Option 2 (Per-Version Signing)

Priority: 🔴 Critical (blocking updates)

Server-Side Changes

// 1. Modify build_orchestrator.go
func BuildAgentWithConfig(config *AgentConfiguration) (*BuildResult, error) {
    // Remove: docker-compose.yml generation
    // Remove: Dockerfile generation

    // Add: Generate config.json file
    configContent, err := generateConfigJSON(config)
    if err != nil {
        return nil, err
    }

    // Add: Sign generic binary
    binaryPath := fmt.Sprintf("/app/binaries/%s/redflag-agent", config.Platform)
    signature, err := signingService.SignFile(binaryPath)
    if err != nil {
        return nil, err
    }

    // Add: Store in database
    packageID, err := storeSignedPackage(config.AgentID, config.Version, config.Platform, signature)
    if err != nil {
        return nil, err
    }

    return &BuildResult{
        AgentID:     config.AgentID,
        Version:     config.Version,
        Platform:    config.Platform,
        BinaryURL:   fmt.Sprintf("/api/v1/downloads/%s", config.Platform),
        ConfigURL:   fmt.Sprintf("/api/v1/config/%s", config.AgentID),
        Signature:   signature,
        PackageID:   packageID,
    }, nil
}

// 2. Update downloadHandler
func (h *DownloadHandler) DownloadAgent(c *gin.Context) {
    platform := c.Param("platform")

    // Check if signed package exists
    if signedPackage, err := h.packageQueries.GetSignedPackage(version, platform); err == nil {
        // Serve signed version
        c.File(signedPackage.BinaryPath)
        return
    }

    // Fallback to unsigned generic binary
    genericPath := fmt.Sprintf("/app/binaries/%s/redflag-agent", platform)
    c.File(genericPath)
}

Files to modify:

  • aggregator-server/internal/api/handlers/build_orchestrator.go
  • aggregator-server/internal/services/agent_builder.go (remove Docker generation)
  • aggregator-server/internal/api/handlers/downloads.go (serve signed versions)
  • aggregator-server/internal/services/signing.go (integration - already working)

Database schema:

-- Already exists:
CREATE TABLE agent_update_packages (
    id UUID PRIMARY KEY,
    agent_id UUID REFERENCES agents(id),
    version VARCHAR(20) NOT NULL,
    platform VARCHAR(20) NOT NULL,
    binary_path VARCHAR(255) NOT NULL,
    signature VARCHAR(128) NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    expires_at TIMESTAMPTZ
);

-- Add index for performance:
CREATE INDEX idx_agent_updates_version_platform
ON agent_update_packages(version, platform)
WHERE agent_id IS NULL;

Testing:

# Test flow:
1. Admin creates agent
2. Admin clicks "Update Agent"
3. Build orchestrator generates signed package
4. Server stores package in database
5. Agent requests update → receives signed binary
6. Agent verifies signature → installs update
7. Verify: Package served, signature valid, agent updated

6. Future Enhancements (Post-Implementation)

Kernel Keyring Config Protection

  • Priority: Medium
  • Timeline: After version upgrade catch-22 resolved
  • Rationale: Defense in depth, not critical for security
// Linux implementation
package keyring

import "github.com/jsipprell/keyctl"

func SaveAgentConfig(agentID string, token string, refreshToken string) error {
    keyring, err := keyctl.UserKeyring()
    if err != nil {
        return err
    }

    // Store JWT token
    tokenKey := fmt.Sprintf("redflag-agent-%s-token", agentID)
    _, err = keyring.Add(tokenKey, []byte(token))
    if err != nil {
        return err
    }

    // Store refresh token
    refreshKey := fmt.Sprintf("redflag-agent-%s-refresh", agentID)
    _, err = keyring.Add(refreshKey, []byte(refreshToken))
    return err
}
// Windows implementation
package keyring

import "github.com/danieljoos/wincred"

func SaveAgentConfigWindows(agentID string, token string, refreshToken string) error {
    cred := wincred.NewGenericCredential(fmt.Sprintf("redflag-agent-%s", agentID))
    cred.CredentialBlob = []byte(fmt.Sprintf("token:%s\nrefresh:%s", token, refreshToken))
    return cred.Write()
}

Certificate-Based Authentication (v2.0)

  • Priority: Low
  • Timeline: Future major version
  • Rationale: Sufficient security with current model

If implemented:

  • Replace JWT tokens with TLS client certificates
  • Per-agent certificate generation during registration
  • No shared secrets
  • Automatic cert rotation
  • Revocation via CRL or OCSP

Tradeoffs:

  • Stronger crypto (per-agent keys)
  • No shared secrets
  • PKI management complexity
  • CRL/OCSP infrastructure
  • Certificate renewal automation
  • Revocation management

7. Decision Log

Date: 2025-11-10

Decision: Implement Option 2 (Per-Version/Platform Signing) as described in this document

Decision Makers: @Fimeg, @Kimi, @Grok

Rationale:

  • Sufficient security given existing protections (machine ID binding, token lifetimes, revocation)
  • Superior operational characteristics (99.7% storage savings, CDN friendly, simple rollback)
  • Scales from 10 to 10,000 agents
  • Simpler implementation and maintenance

Rejected Alternatives:

  • Option 1 (Per-Agent): Operational overhead not justified by marginal security gain
  • Option 3 (Hybrid Obfuscation): False security, minimal value

8. Open Questions & Follow-ups

8.1 Token Security Enhancement

Question: Should we implement field-level encryption for tokens in config? Recommendation: Implement kernel keyring/Credential Manager storage for tokens as optional defense-in-depth layer after Tier 1 security issues resolved.

8.2 Refresh Token Rotation Strategy

Question: Should we implement "true rotation" (new token per use) vs current "sliding window"? Current State: Sliding window extends expiry but keeps same token Recommendation: Keep sliding window for now (simpler), implement true rotation if security audit identifies token theft as actual risk.

8.3 Debugging in Production

Question: How to balance debug logging needs with security (JWT secret exposure)? Recommendation: Implement proper logging levels (debug/info/warn/error), require explicit REDFLAG_DEBUG=true for sensitive logs.


9. References

Documentation

  • Status.md - Comprehensive security architecture status
  • todayupdate.md - Consolidated master documentation
  • answer.md - Token system analysis (by Grok)
  • SMART_INSTALLER_FLOW.md - Installer script documentation
  • MIGRATION_IMPLEMENTATION_STATUS.md - Migration system details

Code Locations

  • aggregator-server/internal/api/handlers/build_orchestrator.go:77-84 - Docker build instructions
  • aggregator-server/internal/services/agent_builder.go:171-245 - Docker config generation
  • aggregator-server/internal/services/signing.go - Ed25519 signing service (working)
  • aggregator-server/internal/api/handlers/downloads.go:175,244 - Binary serving
  • aggregator-server/internal/api/middleware/machine_binding.go:235-253 - Version upgrade enhancement

Database Schema

  • agent_update_packages - Signed package storage
  • registration_tokens - Multi-seat registration tokens
  • refresh_tokens - Long-lived rotating tokens (90d)

Document Version: 1.0 Last Updated: 2025-11-10 Status: Awaiting final review and approval Next Step: Implement Option 2 per this specification