20 KiB
RedFlag Binary Signing Strategy Decision Document
Date: 2025-11-10 Version: 0.1.23.4 Status: Architecture Decision Record (ADR) - In Review
1. Decision Context
1.1 Background
RedFlag implements Ed25519 digital signatures for agent binary integrity verification. The signing infrastructure (signingService.SignFile()) is operational, but the workflow integration is incomplete - the build orchestrator generates Docker deployment configs instead of signed native binaries.
The agent install script expects:
- Native binaries (Linux: ELF, Windows: PE)
- Ed25519 signatures for verification
- Configurable via
config.json - Deployed via systemd/Windows Service Manager
The current build orchestrator generates:
docker-compose.yml(Docker container deployment)Dockerfile(multi-stage build instructions)- Embedded Go config (compile-time injection)
1.2 Problem Statement
Critical Gap: When an admin clicks "Update Agent" in the UI, the server looks for signed packages in agent_update_packages table, finds zero packages, and returns 404 Not Found.
Root Cause: The build pipeline produces unsigned generic binaries during Docker multi-stage build, but never:
- Signs the binaries with Ed25519 private key
- Embeds agent-specific configuration
- Stores signed binary metadata in database
- Serves signed versions via download endpoint
1.3 Decision Required
Question: How should the build orchestrator generate signed binaries?
Option 1: Per-Agent Signing (unique binary + signature for each agent) Option 2: Per-Version/Platform Signing (one binary + signature per version/platform) Option 3: Hybrid approach (per-version binary with per-agent config obfuscation)
2. Options Analysis
2.1 Option 1: Per-Agent Signing
Implementation
// For each agent:
1. Take generic binary from /app/binaries/{platform}/
2. Embed agent-specific config.json (agent_id, token, server_url)
3. Compile/repackage with embedded config
4. Sign resulting binary with Ed25519 private key
5. Store in database: agent_update_packages_{agent_id}_{version}_{platform}
6. Serve via /api/v1/downloads/{agent_id}/{platform}
Security Properties
Strengths:
- ✅ Single-file deployment (binary includes config)
- ✅ Config protected by binary signature
- ✅ Slightly higher bar for config extraction
- ✅ Per-agent unique artifacts
Weaknesses:
- ⚠️ Config still extractable (reverse engineering or runtime memory dump)
- ⚠️ Minimal security gain over Option 2 (see Threat Analysis below)
- ⚠️ Config obscurity, not encryption
Operational Impact
Storage:
- 1,000 agents × 11 MB binary = 11 GB storage
- Each agent requires unique binary copy
- CDN caching ineffective (unique URLs per agent)
Compute:
- ~10ms Ed25519 sign operation per agent
- 1,000 agents = 10 seconds CPU time
- Serial bottleneck during mass updates
- Parallel signing possible but adds complexity
Network:
- Each agent downloads unique binary
- Cannot share downloads across agents
- Bandwidth usage scales linearly with agent count
Cache Efficiency:
- CDN or proxy caching: Poor
- Each agent has different URL:
/downloads/{agent_id}/linux-amd64 - No shared cache hits
Rollback Complexity:
- Must track per-agent version in database
- Cannot roll back all agents simultaneously with single version number
- Each agent has independent version history
Build Time:
- Sign each agent individually
- Cannot pre-sign binaries before agent deployment
- On-demand signing introduces latency
Use Cases
When this makes sense:
- Ultra-high security environments with regulatory requirements for config-at-rest encryption
- Small deployments (<100 agents) where storage is not a concern
- When config secrecy is paramount and worth the operational overhead
When this is overkill:
- Standard MSP deployments (100-10,000 agents)
- When operational simplicity is valued
- When config is not highly sensitive (already protected by machine binding)
2.2 Option 2: Per-Version/Platform Signing
Implementation
// Once per version/platform:
1. Take generic binary from /app/binaries/{platform}/
2. Sign generic binary with Ed25519 private key
3. Store in database: agent_update_packages_{version}_{platform}
4. Serve via /api/v1/downloads/{platform}
// Per-agent config (separate):
5. Generate config.json (agent_id, token, server_url)
6. Download binary + config.json independently
7. Agent verifies binary signature
8. Agent loads config from file
Security Properties
Strengths:
- ✅ All cryptographic guarantees of Option 1
- ✅ Token lifetime controls (24h JWT, 90d refresh) limit exposure
- ✅ Server-side validation (machine ID binding) prevents misuse
- ✅ Token revocation capability
Addressing Config Protection Concerns:
Q: "But config is plaintext on disk!" A: "Is that actually a problem?"
Current protections:
- File permissions:
0600(owner read/write only) - Machine ID binding: Config only works on one machine
- Token lifetimes: 24h JWT, 90d refresh window
- Revocation: Tokens can be revoked at any time
- Registration tokens: Single-use or multi-seat (limited)
Attack scenarios with Option 2:
Scenario 1: Attacker gains filesystem access to agent machine
Attacker actions:
- Can read /etc/redflag/config.json
- Sees: {"server_url": "...", "agent_id": "...", "token": "..."}
Questions:
Q: Can attacker use this token on another machine?
A: No - Machine ID binding (server validates X-Machine-ID header)
Q: Can attacker register new agent with this token?
A: No - Registration token used once (or multi-seat but tracked)
Q: Can attacker impersonate this agent?
A: Only from the already-compromised machine (attacker already has access)
Q: Is token exposure the biggest concern?
A: No - If attacker has filesystem access, they can execute commands as agent anyway
Scenario 2: Attacker steals disk image (offline attack)
Attacker actions:
- Clones VM disk
- Boots on different hardware
- Tries to use stolen config
Questions:
Q: Will machine ID validation fail?
A: Yes - Different hardware = different machine fingerprint
Q: Can attacker bypass machine ID check?
A: No - It's server-side validation, not client-side
Conclusion: Stolen disk useless without original hardware
Scenario 3: Malicious insider (legitimate access)
Attacker actions:
- Has root access to agent machine
- Can read config files
- Can execute commands as agent user
Questions:
Q: What additional damage can they do with tokens?
A: None - they already have agent-level access
Q: Can tokens be used elsewhere?
A: No - bound to specific machine
Conclusion: Tokens are not the attack vector - compromised machine is
Verdict: Config plaintext storage is NOT a critical vulnerability given existing protections.
Operational Impact
Storage:
- 3 platforms × 11 MB binary = 33 MB total
- Example platforms: linux-amd64, linux-arm64, windows-amd64
- 99.7% storage savings vs Option 1 (1,000 agents)
Compute:
- One 10ms Ed25519 sign operation per version/platform
- Signing once during release process
- Can be pre-signed before deployment
Network:
- Binary downloaded once per platform
- CDN or proxy caching: Excellent
- All agents share same URL:
/downloads/linux-amd64 - Cache hits for subsequent agents
Cache Efficiency:
- CDN can cache single binary for all agents
- Corporate proxies cache effectively
- Bandwidth usage scales sub-linearly
Rollback Complexity:
- Simple: Change version number in database
- All agents roll back together
- Single point of control
Build Time:
- Sign once during CI/CD pipeline
- No on-demand signing latency
- Immediate availability
Agent-Side Simplicity
// Agent update process:
func UpdateAgent() error {
// 1. Download signed binary
binary, sig := download("/downloads/linux-amd64")
// 2. Verify signature
if !verifyBinarySignature(binary, sig, serverPublicKey) {
return fmt.Errorf("signature verification failed")
}
// 3. Atomically replace binary
return atomicReplace(binary, "/usr/local/bin/redflag-agent")
// Note: Config.json remains unchanged (tokens still valid)
}
Benefits:
- No config rewriting during updates
- Token persistence across updates
- Simpler state management
2.3 Option 3: Hybrid (Per-Version Binary + Config Obfuscation)
Implementation
// Combine Option 2 with lightweight config protection:
1. Sign generic binary per version/platform (Option 2)
2. Obfuscate (not encrypt) config.json
3. Use XOR or simple transformation
4. Breaks casual inspection (grep for "token")
// Config on disk:
/etc/redflag/config.dat // Binary blob, not JSON
// or:
/etc/redflag/config.json // Obfuscated fields
Security Assessment
Pros:
- ✅ Slightly raises bar for casual inspection
- ✅ Low implementation complexity
- ✅ Fast (no crypto operations)
Cons:
- ❌ Not real security (obfuscation ≠ encryption)
- ❌ Easily reversed (one debugger breakpoint)
- ❌ False sense of security
Recommendation
Skip this option. Either do proper security (Option 2 with kernel keyring for config) or accept that tokens are short-lived and protected by other mechanisms. Obfuscation provides minimal value.
3. Threat Analysis Model
Attack Scenario Matrix
| Attack Vector | Option 1 (Per-Agent) | Option 2 (Per-Version) | Mitigation Available? |
|---|---|---|---|
| Token theft from filesystem | Config in binary (harder to extract) | Config in plaintext file (easier to read) | Yes: Machine ID binding prevents cross-machine use |
| Stolen disk image | Machine ID different (fails) | Machine ID different (fails) | Yes: Server-side validation |
| Network sniffing | HTTPS protects tokens | HTTPS protects tokens | Yes: TLS encryption in transit |
| JWT token compromise | 24h window | 24h window | Yes: Short lifetime, refresh token rotation |
| Refresh token compromise | 90d window | 90d window | Yes: Can revoke, machine binding |
| Registration token theft | Single-use or limited seats | Single-use or limited seats | Yes: Expiration, seat limits, revocation |
| Binary tampering | Signature verification catches | Signature verification catches | Yes: Ed25519 verification |
| Malicious insider | Attacker already has access | Attacker already has access | No: Physical/root access defeats both |
Critical Insights
1. Config extraction is not the primary attack vector
- If attacker has filesystem access, they can execute commands as agent
- Tokens are secondary concern
- Machine ID binding prevents cross-machine token reuse
2. Machine ID binding is the real protection
- Prevents config copying to unauthorized machines
- Server-side validation (can't bypass)
- Hardware-rooted (difficult to spoof)
3. Token lifetimes limit damage
- JWT: 24h max exposure
- Refresh token: 90d max (revocable)
- Rotation reduces window further
4. Client-side config protection is marginal
- Attacker with root access can dump process memory
- Attacker with physical access can extract keys
- Obfuscation/encryption only slows down determined attacker
4. Recommendation
Primary Recommendation: Option 2 (Per-Version/Platform Signing)
Rationale:
-
Security is sufficient
- Tokens protected by machine ID binding
- Short lifetimes limit exposure
- Revocation capability exists
- Config plaintext is not critical vulnerability
-
Operational efficiency
- 99.7% storage savings (33MB vs 11GB for 1,000 agents)
- Excellent CDN/proxy caching
- Fast signing (once per version)
- Simple rollback (single version number)
-
Scalability
- Works for 10 agents or 10,000 agents
- Sub-linear bandwidth usage
- No per-agent build complexity
-
Implementation simplicity
- Agent updates don't rewrite config
- Token persistence across updates
- Clear separation of concerns
Secondary Recommendation: Kernel Keyring Config Protection (Future Enhancement)
// For defense in depth, not immediate need:
func LoadConfig() (*Config, error) {
// Try kernel keyring first
if keyringConfig, err := loadFromKeyring(); err == nil {
return keyringConfig, nil
}
// Fallback to file
return loadFromFile("/etc/redflag/config.json")
}
// On token refresh:
func SaveTokens() error {
// Store encrypted in kernel keyring (Linux)
// Or Windows Credential Manager (Windows)
return saveToKeyring(agentID, token, refreshToken)
}
Why this is optional:
- Tokens already have short lifetimes
- Machine ID binding prevents misuse
- File permissions already restrict access
- Implementation complexity not justified by security gain
When to implement:
- Regulatory requirement for config-at-rest encryption
- High-security environment with strict compliance needs
- After all Tier 1 security gaps are addressed
5. Implementation Plan
Phase 1: Implement Option 2 (Per-Version Signing)
Priority: 🔴 Critical (blocking updates)
Server-Side Changes
// 1. Modify build_orchestrator.go
func BuildAgentWithConfig(config *AgentConfiguration) (*BuildResult, error) {
// Remove: docker-compose.yml generation
// Remove: Dockerfile generation
// Add: Generate config.json file
configContent, err := generateConfigJSON(config)
if err != nil {
return nil, err
}
// Add: Sign generic binary
binaryPath := fmt.Sprintf("/app/binaries/%s/redflag-agent", config.Platform)
signature, err := signingService.SignFile(binaryPath)
if err != nil {
return nil, err
}
// Add: Store in database
packageID, err := storeSignedPackage(config.AgentID, config.Version, config.Platform, signature)
if err != nil {
return nil, err
}
return &BuildResult{
AgentID: config.AgentID,
Version: config.Version,
Platform: config.Platform,
BinaryURL: fmt.Sprintf("/api/v1/downloads/%s", config.Platform),
ConfigURL: fmt.Sprintf("/api/v1/config/%s", config.AgentID),
Signature: signature,
PackageID: packageID,
}, nil
}
// 2. Update downloadHandler
func (h *DownloadHandler) DownloadAgent(c *gin.Context) {
platform := c.Param("platform")
// Check if signed package exists
if signedPackage, err := h.packageQueries.GetSignedPackage(version, platform); err == nil {
// Serve signed version
c.File(signedPackage.BinaryPath)
return
}
// Fallback to unsigned generic binary
genericPath := fmt.Sprintf("/app/binaries/%s/redflag-agent", platform)
c.File(genericPath)
}
Files to modify:
aggregator-server/internal/api/handlers/build_orchestrator.goaggregator-server/internal/services/agent_builder.go(remove Docker generation)aggregator-server/internal/api/handlers/downloads.go(serve signed versions)aggregator-server/internal/services/signing.go(integration - already working)
Database schema:
-- Already exists:
CREATE TABLE agent_update_packages (
id UUID PRIMARY KEY,
agent_id UUID REFERENCES agents(id),
version VARCHAR(20) NOT NULL,
platform VARCHAR(20) NOT NULL,
binary_path VARCHAR(255) NOT NULL,
signature VARCHAR(128) NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW(),
expires_at TIMESTAMPTZ
);
-- Add index for performance:
CREATE INDEX idx_agent_updates_version_platform
ON agent_update_packages(version, platform)
WHERE agent_id IS NULL;
Testing:
# Test flow:
1. Admin creates agent
2. Admin clicks "Update Agent"
3. Build orchestrator generates signed package
4. Server stores package in database
5. Agent requests update → receives signed binary
6. Agent verifies signature → installs update
7. Verify: Package served, signature valid, agent updated
6. Future Enhancements (Post-Implementation)
Kernel Keyring Config Protection
- Priority: Medium
- Timeline: After version upgrade catch-22 resolved
- Rationale: Defense in depth, not critical for security
// Linux implementation
package keyring
import "github.com/jsipprell/keyctl"
func SaveAgentConfig(agentID string, token string, refreshToken string) error {
keyring, err := keyctl.UserKeyring()
if err != nil {
return err
}
// Store JWT token
tokenKey := fmt.Sprintf("redflag-agent-%s-token", agentID)
_, err = keyring.Add(tokenKey, []byte(token))
if err != nil {
return err
}
// Store refresh token
refreshKey := fmt.Sprintf("redflag-agent-%s-refresh", agentID)
_, err = keyring.Add(refreshKey, []byte(refreshToken))
return err
}
// Windows implementation
package keyring
import "github.com/danieljoos/wincred"
func SaveAgentConfigWindows(agentID string, token string, refreshToken string) error {
cred := wincred.NewGenericCredential(fmt.Sprintf("redflag-agent-%s", agentID))
cred.CredentialBlob = []byte(fmt.Sprintf("token:%s\nrefresh:%s", token, refreshToken))
return cred.Write()
}
Certificate-Based Authentication (v2.0)
- Priority: Low
- Timeline: Future major version
- Rationale: Sufficient security with current model
If implemented:
- Replace JWT tokens with TLS client certificates
- Per-agent certificate generation during registration
- No shared secrets
- Automatic cert rotation
- Revocation via CRL or OCSP
Tradeoffs:
- Stronger crypto (per-agent keys)
- No shared secrets
- PKI management complexity
- CRL/OCSP infrastructure
- Certificate renewal automation
- Revocation management
7. Decision Log
Date: 2025-11-10
Decision: Implement Option 2 (Per-Version/Platform Signing) as described in this document
Decision Makers: @Fimeg, @Kimi, @Grok
Rationale:
- Sufficient security given existing protections (machine ID binding, token lifetimes, revocation)
- Superior operational characteristics (99.7% storage savings, CDN friendly, simple rollback)
- Scales from 10 to 10,000 agents
- Simpler implementation and maintenance
Rejected Alternatives:
- Option 1 (Per-Agent): Operational overhead not justified by marginal security gain
- Option 3 (Hybrid Obfuscation): False security, minimal value
8. Open Questions & Follow-ups
8.1 Token Security Enhancement
Question: Should we implement field-level encryption for tokens in config? Recommendation: Implement kernel keyring/Credential Manager storage for tokens as optional defense-in-depth layer after Tier 1 security issues resolved.
8.2 Refresh Token Rotation Strategy
Question: Should we implement "true rotation" (new token per use) vs current "sliding window"? Current State: Sliding window extends expiry but keeps same token Recommendation: Keep sliding window for now (simpler), implement true rotation if security audit identifies token theft as actual risk.
8.3 Debugging in Production
Question: How to balance debug logging needs with security (JWT secret exposure)?
Recommendation: Implement proper logging levels (debug/info/warn/error), require explicit REDFLAG_DEBUG=true for sensitive logs.
9. References
Documentation
Status.md- Comprehensive security architecture statustodayupdate.md- Consolidated master documentationanswer.md- Token system analysis (by Grok)SMART_INSTALLER_FLOW.md- Installer script documentationMIGRATION_IMPLEMENTATION_STATUS.md- Migration system details
Code Locations
aggregator-server/internal/api/handlers/build_orchestrator.go:77-84- Docker build instructionsaggregator-server/internal/services/agent_builder.go:171-245- Docker config generationaggregator-server/internal/services/signing.go- Ed25519 signing service (working)aggregator-server/internal/api/handlers/downloads.go:175,244- Binary servingaggregator-server/internal/api/middleware/machine_binding.go:235-253- Version upgrade enhancement
Database Schema
agent_update_packages- Signed package storageregistration_tokens- Multi-seat registration tokensrefresh_tokens- Long-lived rotating tokens (90d)
Document Version: 1.0 Last Updated: 2025-11-10 Status: Awaiting final review and approval Next Step: Implement Option 2 per this specification