Fimeg/Redflag

Fork 0

Files

Fimeg 484a7f77ce Add docs and project files - force for Culurien

2026-03-28 20:46:24 -04:00

20 KiB

Raw Blame History

RedFlag Binary Signing Strategy Decision Document

Date: 2025-11-10 Version: 0.1.23.4 Status: Architecture Decision Record (ADR) - In Review

1. Decision Context

1.1 Background

RedFlag implements Ed25519 digital signatures for agent binary integrity verification. The signing infrastructure (signingService.SignFile()) is operational, but the workflow integration is incomplete - the build orchestrator generates Docker deployment configs instead of signed native binaries.

The agent install script expects:

Native binaries (Linux: ELF, Windows: PE)
Ed25519 signatures for verification
Configurable via config.json
Deployed via systemd/Windows Service Manager

The current build orchestrator generates:

docker-compose.yml (Docker container deployment)
Dockerfile (multi-stage build instructions)
Embedded Go config (compile-time injection)

1.2 Problem Statement

Critical Gap: When an admin clicks "Update Agent" in the UI, the server looks for signed packages in agent_update_packages table, finds zero packages, and returns 404 Not Found.

Root Cause: The build pipeline produces unsigned generic binaries during Docker multi-stage build, but never:

Signs the binaries with Ed25519 private key
Embeds agent-specific configuration
Stores signed binary metadata in database
Serves signed versions via download endpoint

1.3 Decision Required

Question: How should the build orchestrator generate signed binaries?

Option 1: Per-Agent Signing (unique binary + signature for each agent) Option 2: Per-Version/Platform Signing (one binary + signature per version/platform) Option 3: Hybrid approach (per-version binary with per-agent config obfuscation)

2. Options Analysis

2.1 Option 1: Per-Agent Signing

Implementation

// For each agent:
1. Take generic binary from /app/binaries/{platform}/
2. Embed agent-specific config.json (agent_id, token, server_url)
3. Compile/repackage with embedded config
4. Sign resulting binary with Ed25519 private key
5. Store in database: agent_update_packages_{agent_id}_{version}_{platform}
6. Serve via /api/v1/downloads/{agent_id}/{platform}

Security Properties

Strengths:

✅ Single-file deployment (binary includes config)
✅ Config protected by binary signature
✅ Slightly higher bar for config extraction
✅ Per-agent unique artifacts

Weaknesses:

⚠️ Config still extractable (reverse engineering or runtime memory dump)
⚠️ Minimal security gain over Option 2 (see Threat Analysis below)
⚠️ Config obscurity, not encryption

Operational Impact

Storage:

1,000 agents × 11 MB binary = 11 GB storage
Each agent requires unique binary copy
CDN caching ineffective (unique URLs per agent)

Compute:

~10ms Ed25519 sign operation per agent
1,000 agents = 10 seconds CPU time
Serial bottleneck during mass updates
Parallel signing possible but adds complexity

Network:

Each agent downloads unique binary
Cannot share downloads across agents
Bandwidth usage scales linearly with agent count

Cache Efficiency:

CDN or proxy caching: Poor
Each agent has different URL: /downloads/{agent_id}/linux-amd64
No shared cache hits

Rollback Complexity:

Must track per-agent version in database
Cannot roll back all agents simultaneously with single version number
Each agent has independent version history

Build Time:

Sign each agent individually
Cannot pre-sign binaries before agent deployment
On-demand signing introduces latency

Use Cases

When this makes sense:

Ultra-high security environments with regulatory requirements for config-at-rest encryption
Small deployments (<100 agents) where storage is not a concern
When config secrecy is paramount and worth the operational overhead

When this is overkill:

Standard MSP deployments (100-10,000 agents)
When operational simplicity is valued
When config is not highly sensitive (already protected by machine binding)

2.2 Option 2: Per-Version/Platform Signing

Implementation

// Once per version/platform:
1. Take generic binary from /app/binaries/{platform}/
2. Sign generic binary with Ed25519 private key
3. Store in database: agent_update_packages_{version}_{platform}
4. Serve via /api/v1/downloads/{platform}

// Per-agent config (separate):
5. Generate config.json (agent_id, token, server_url)
6. Download binary + config.json independently
7. Agent verifies binary signature
8. Agent loads config from file

Security Properties

Strengths:

✅ All cryptographic guarantees of Option 1
✅ Token lifetime controls (24h JWT, 90d refresh) limit exposure
✅ Server-side validation (machine ID binding) prevents misuse
✅ Token revocation capability

Addressing Config Protection Concerns:

Q: "But config is plaintext on disk!" A: "Is that actually a problem?"

Current protections:

File permissions: 0600 (owner read/write only)
Machine ID binding: Config only works on one machine
Token lifetimes: 24h JWT, 90d refresh window
Revocation: Tokens can be revoked at any time
Registration tokens: Single-use or multi-seat (limited)

Attack scenarios with Option 2:

Scenario 1: Attacker gains filesystem access to agent machine

Attacker actions:
- Can read /etc/redflag/config.json
- Sees: {"server_url": "...", "agent_id": "...", "token": "..."}

Questions:
Q: Can attacker use this token on another machine?
A: No - Machine ID binding (server validates X-Machine-ID header)

Q: Can attacker register new agent with this token?
A: No - Registration token used once (or multi-seat but tracked)

Q: Can attacker impersonate this agent?
A: Only from the already-compromised machine (attacker already has access)

Q: Is token exposure the biggest concern?
A: No - If attacker has filesystem access, they can execute commands as agent anyway

Scenario 2: Attacker steals disk image (offline attack)

Attacker actions:
- Clones VM disk
- Boots on different hardware
- Tries to use stolen config

Questions:
Q: Will machine ID validation fail?
A: Yes - Different hardware = different machine fingerprint

Q: Can attacker bypass machine ID check?
A: No - It's server-side validation, not client-side

Conclusion: Stolen disk useless without original hardware

Scenario 3: Malicious insider (legitimate access)

Attacker actions:
- Has root access to agent machine
- Can read config files
- Can execute commands as agent user

Questions:
Q: What additional damage can they do with tokens?
A: None - they already have agent-level access

Q: Can tokens be used elsewhere?
A: No - bound to specific machine

Conclusion: Tokens are not the attack vector - compromised machine is

Verdict: Config plaintext storage is NOT a critical vulnerability given existing protections.

Operational Impact

Storage:

3 platforms × 11 MB binary = 33 MB total
Example platforms: linux-amd64, linux-arm64, windows-amd64
99.7% storage savings vs Option 1 (1,000 agents)

Compute:

One 10ms Ed25519 sign operation per version/platform
Signing once during release process
Can be pre-signed before deployment

Network:

Binary downloaded once per platform
CDN or proxy caching: Excellent
All agents share same URL: /downloads/linux-amd64
Cache hits for subsequent agents

Cache Efficiency:

CDN can cache single binary for all agents
Corporate proxies cache effectively
Bandwidth usage scales sub-linearly

Rollback Complexity:

Simple: Change version number in database
All agents roll back together
Single point of control

Build Time:

Sign once during CI/CD pipeline
No on-demand signing latency
Immediate availability

Agent-Side Simplicity

// Agent update process:
func UpdateAgent() error {
    // 1. Download signed binary
    binary, sig := download("/downloads/linux-amd64")

    // 2. Verify signature
    if !verifyBinarySignature(binary, sig, serverPublicKey) {
        return fmt.Errorf("signature verification failed")
    }

    // 3. Atomically replace binary
    return atomicReplace(binary, "/usr/local/bin/redflag-agent")

    // Note: Config.json remains unchanged (tokens still valid)
}

Benefits:

No config rewriting during updates
Token persistence across updates
Simpler state management

2.3 Option 3: Hybrid (Per-Version Binary + Config Obfuscation)

Implementation

// Combine Option 2 with lightweight config protection:
1. Sign generic binary per version/platform (Option 2)
2. Obfuscate (not encrypt) config.json
3. Use XOR or simple transformation
4. Breaks casual inspection (grep for "token")

// Config on disk:
/etc/redflag/config.dat  // Binary blob, not JSON
// or:
/etc/redflag/config.json  // Obfuscated fields

Security Assessment

Pros:

✅ Slightly raises bar for casual inspection
✅ Low implementation complexity
✅ Fast (no crypto operations)

Cons:

❌ Not real security (obfuscation ≠ encryption)
❌ Easily reversed (one debugger breakpoint)
❌ False sense of security

Recommendation

Skip this option. Either do proper security (Option 2 with kernel keyring for config) or accept that tokens are short-lived and protected by other mechanisms. Obfuscation provides minimal value.

3. Threat Analysis Model

Attack Scenario Matrix

Attack Vector	Option 1 (Per-Agent)	Option 2 (Per-Version)	Mitigation Available?
Token theft from filesystem	Config in binary (harder to extract)	Config in plaintext file (easier to read)	Yes: Machine ID binding prevents cross-machine use
Stolen disk image	Machine ID different (fails)	Machine ID different (fails)	Yes: Server-side validation
Network sniffing	HTTPS protects tokens	HTTPS protects tokens	Yes: TLS encryption in transit
JWT token compromise	24h window	24h window	Yes: Short lifetime, refresh token rotation
Refresh token compromise	90d window	90d window	Yes: Can revoke, machine binding
Registration token theft	Single-use or limited seats	Single-use or limited seats	Yes: Expiration, seat limits, revocation
Binary tampering	Signature verification catches	Signature verification catches	Yes: Ed25519 verification
Malicious insider	Attacker already has access	Attacker already has access	No: Physical/root access defeats both

Critical Insights

1. Config extraction is not the primary attack vector

If attacker has filesystem access, they can execute commands as agent
Tokens are secondary concern
Machine ID binding prevents cross-machine token reuse

2. Machine ID binding is the real protection

Prevents config copying to unauthorized machines
Server-side validation (can't bypass)
Hardware-rooted (difficult to spoof)

3. Token lifetimes limit damage

JWT: 24h max exposure
Refresh token: 90d max (revocable)
Rotation reduces window further

4. Client-side config protection is marginal

Attacker with root access can dump process memory
Attacker with physical access can extract keys
Obfuscation/encryption only slows down determined attacker

4. Recommendation

Primary Recommendation: Option 2 (Per-Version/Platform Signing)

Rationale:

Security is sufficient
- Tokens protected by machine ID binding
- Short lifetimes limit exposure
- Revocation capability exists
- Config plaintext is not critical vulnerability
Operational efficiency
- 99.7% storage savings (33MB vs 11GB for 1,000 agents)
- Excellent CDN/proxy caching
- Fast signing (once per version)
- Simple rollback (single version number)
Scalability
- Works for 10 agents or 10,000 agents
- Sub-linear bandwidth usage
- No per-agent build complexity
Implementation simplicity
- Agent updates don't rewrite config
- Token persistence across updates
- Clear separation of concerns

Secondary Recommendation: Kernel Keyring Config Protection (Future Enhancement)

// For defense in depth, not immediate need:
func LoadConfig() (*Config, error) {
    // Try kernel keyring first
    if keyringConfig, err := loadFromKeyring(); err == nil {
        return keyringConfig, nil
    }

    // Fallback to file
    return loadFromFile("/etc/redflag/config.json")
}

// On token refresh:
func SaveTokens() error {
    // Store encrypted in kernel keyring (Linux)
    // Or Windows Credential Manager (Windows)
    return saveToKeyring(agentID, token, refreshToken)
}

Why this is optional:

Tokens already have short lifetimes
Machine ID binding prevents misuse
File permissions already restrict access
Implementation complexity not justified by security gain

When to implement:

Regulatory requirement for config-at-rest encryption
High-security environment with strict compliance needs
After all Tier 1 security gaps are addressed

5. Implementation Plan

Phase 1: Implement Option 2 (Per-Version Signing)

Priority: 🔴 Critical (blocking updates)

Server-Side Changes

// 1. Modify build_orchestrator.go
func BuildAgentWithConfig(config *AgentConfiguration) (*BuildResult, error) {
    // Remove: docker-compose.yml generation
    // Remove: Dockerfile generation

    // Add: Generate config.json file
    configContent, err := generateConfigJSON(config)
    if err != nil {
        return nil, err
    }

    // Add: Sign generic binary
    binaryPath := fmt.Sprintf("/app/binaries/%s/redflag-agent", config.Platform)
    signature, err := signingService.SignFile(binaryPath)
    if err != nil {
        return nil, err
    }

    // Add: Store in database
    packageID, err := storeSignedPackage(config.AgentID, config.Version, config.Platform, signature)
    if err != nil {
        return nil, err
    }

    return &BuildResult{
        AgentID:     config.AgentID,
        Version:     config.Version,
        Platform:    config.Platform,
        BinaryURL:   fmt.Sprintf("/api/v1/downloads/%s", config.Platform),
        ConfigURL:   fmt.Sprintf("/api/v1/config/%s", config.AgentID),
        Signature:   signature,
        PackageID:   packageID,
    }, nil
}

// 2. Update downloadHandler
func (h *DownloadHandler) DownloadAgent(c *gin.Context) {
    platform := c.Param("platform")

    // Check if signed package exists
    if signedPackage, err := h.packageQueries.GetSignedPackage(version, platform); err == nil {
        // Serve signed version
        c.File(signedPackage.BinaryPath)
        return
    }

    // Fallback to unsigned generic binary
    genericPath := fmt.Sprintf("/app/binaries/%s/redflag-agent", platform)
    c.File(genericPath)
}

Files to modify:

aggregator-server/internal/api/handlers/build_orchestrator.go
aggregator-server/internal/services/agent_builder.go (remove Docker generation)
aggregator-server/internal/api/handlers/downloads.go (serve signed versions)
aggregator-server/internal/services/signing.go (integration - already working)

Database schema:

-- Already exists:
CREATE TABLE agent_update_packages (
    id UUID PRIMARY KEY,
    agent_id UUID REFERENCES agents(id),
    version VARCHAR(20) NOT NULL,
    platform VARCHAR(20) NOT NULL,
    binary_path VARCHAR(255) NOT NULL,
    signature VARCHAR(128) NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    expires_at TIMESTAMPTZ
);

-- Add index for performance:
CREATE INDEX idx_agent_updates_version_platform
ON agent_update_packages(version, platform)
WHERE agent_id IS NULL;

Testing:

# Test flow:
1. Admin creates agent
2. Admin clicks "Update Agent"
3. Build orchestrator generates signed package
4. Server stores package in database
5. Agent requests update → receives signed binary
6. Agent verifies signature → installs update
7. Verify: Package served, signature valid, agent updated

6. Future Enhancements (Post-Implementation)

Kernel Keyring Config Protection

Priority: Medium
Timeline: After version upgrade catch-22 resolved
Rationale: Defense in depth, not critical for security

// Linux implementation
package keyring

import "github.com/jsipprell/keyctl"

func SaveAgentConfig(agentID string, token string, refreshToken string) error {
    keyring, err := keyctl.UserKeyring()
    if err != nil {
        return err
    }

    // Store JWT token
    tokenKey := fmt.Sprintf("redflag-agent-%s-token", agentID)
    _, err = keyring.Add(tokenKey, []byte(token))
    if err != nil {
        return err
    }

    // Store refresh token
    refreshKey := fmt.Sprintf("redflag-agent-%s-refresh", agentID)
    _, err = keyring.Add(refreshKey, []byte(refreshToken))
    return err
}

// Windows implementation
package keyring

import "github.com/danieljoos/wincred"

func SaveAgentConfigWindows(agentID string, token string, refreshToken string) error {
    cred := wincred.NewGenericCredential(fmt.Sprintf("redflag-agent-%s", agentID))
    cred.CredentialBlob = []byte(fmt.Sprintf("token:%s\nrefresh:%s", token, refreshToken))
    return cred.Write()
}

Certificate-Based Authentication (v2.0)

Priority: Low
Timeline: Future major version
Rationale: Sufficient security with current model

If implemented:

Replace JWT tokens with TLS client certificates
Per-agent certificate generation during registration
No shared secrets
Automatic cert rotation
Revocation via CRL or OCSP

Tradeoffs:

Stronger crypto (per-agent keys)
No shared secrets

PKI management complexity
CRL/OCSP infrastructure
Certificate renewal automation
Revocation management

7. Decision Log

Date: 2025-11-10

Decision: Implement Option 2 (Per-Version/Platform Signing) as described in this document

Decision Makers: @Fimeg, @Kimi, @Grok

Rationale:

Sufficient security given existing protections (machine ID binding, token lifetimes, revocation)
Superior operational characteristics (99.7% storage savings, CDN friendly, simple rollback)
Scales from 10 to 10,000 agents
Simpler implementation and maintenance

Rejected Alternatives:

Option 1 (Per-Agent): Operational overhead not justified by marginal security gain
Option 3 (Hybrid Obfuscation): False security, minimal value

8. Open Questions & Follow-ups

8.1 Token Security Enhancement

Question: Should we implement field-level encryption for tokens in config? Recommendation: Implement kernel keyring/Credential Manager storage for tokens as optional defense-in-depth layer after Tier 1 security issues resolved.

8.2 Refresh Token Rotation Strategy

Question: Should we implement "true rotation" (new token per use) vs current "sliding window"? Current State: Sliding window extends expiry but keeps same token Recommendation: Keep sliding window for now (simpler), implement true rotation if security audit identifies token theft as actual risk.

8.3 Debugging in Production

Question: How to balance debug logging needs with security (JWT secret exposure)? Recommendation: Implement proper logging levels (debug/info/warn/error), require explicit REDFLAG_DEBUG=true for sensitive logs.

9. References

Documentation

Status.md - Comprehensive security architecture status
todayupdate.md - Consolidated master documentation
answer.md - Token system analysis (by Grok)
SMART_INSTALLER_FLOW.md - Installer script documentation
MIGRATION_IMPLEMENTATION_STATUS.md - Migration system details

Code Locations

aggregator-server/internal/api/handlers/build_orchestrator.go:77-84 - Docker build instructions
aggregator-server/internal/services/agent_builder.go:171-245 - Docker config generation
aggregator-server/internal/services/signing.go - Ed25519 signing service (working)
aggregator-server/internal/api/handlers/downloads.go:175,244 - Binary serving
aggregator-server/internal/api/middleware/machine_binding.go:235-253 - Version upgrade enhancement

Database Schema

agent_update_packages - Signed package storage
registration_tokens - Multi-seat registration tokens
refresh_tokens - Long-lived rotating tokens (90d)

Document Version: 1.0 Last Updated: 2025-11-10 Status: Awaiting final review and approval Next Step: Implement Option 2 per this specification

20 KiB Raw Blame History Unescape Escape

RedFlag Binary Signing Strategy Decision Document

1. Decision Context

1.1 Background

1.2 Problem Statement

1.3 Decision Required

2. Options Analysis

2.1 Option 1: Per-Agent Signing

Implementation

Security Properties

Operational Impact

Use Cases

2.2 Option 2: Per-Version/Platform Signing

Implementation

Security Properties

Operational Impact

Agent-Side Simplicity

2.3 Option 3: Hybrid (Per-Version Binary + Config Obfuscation)

Implementation

Security Assessment

Recommendation

3. Threat Analysis Model

Attack Scenario Matrix

Critical Insights

4. Recommendation

Primary Recommendation: Option 2 (Per-Version/Platform Signing)

Secondary Recommendation: Kernel Keyring Config Protection (Future Enhancement)

5. Implementation Plan

Phase 1: Implement Option 2 (Per-Version Signing)

Server-Side Changes

6. Future Enhancements (Post-Implementation)

Kernel Keyring Config Protection

Certificate-Based Authentication (v2.0)

7. Decision Log

Date: 2025-11-10

8. Open Questions & Follow-ups

8.1 Token Security Enhancement

8.2 Refresh Token Rotation Strategy

8.3 Debugging in Production

9. References

Documentation

Code Locations

Database Schema

20 KiB

Raw Blame History