Fimeg/Redflag

Fork 0

Files

Fimeg 484a7f77ce Add docs and project files - force for Culurien

2026-03-28 20:46:24 -04:00

17 KiB

Raw Permalink Blame History

RedFlag Security Architecture Audit

Date: 2025-01-07 Version: 0.1.23 Status: 🔴 Security Claims Not Fully Implemented

Executive Summary

RedFlag claims to implement a comprehensive security architecture including:

Ed25519 digital signatures for agent updates
Nonce-based replay protection
Machine ID binding (anti-impersonation)
Trust-On-First-Use (TOFU) public key distribution
Command acknowledgment system

Finding: The security infrastructure code exists and is well-designed, but the update signing workflow is not operational. Zero signed update packages exist in the database, meaning agent updates cannot currently be verified.

Security Components - Detailed Analysis

1. Ed25519 Digital Signatures

✅ What's Implemented (Code Level)

Server Side:

aggregator-server/internal/services/signing.go:45-66 - SignFile() function
- Reads binary file
- Computes SHA-256 checksum
- Signs with Ed25519 private key
- Returns signature + checksum
aggregator-server/internal/api/handlers/agent_updates.go:320-363 - SignUpdatePackage() endpoint
- Receives: {version, platform, architecture, binary_path}
- Calls SignFile()
- Stores in agent_update_packages table

Agent Side:

aggregator-agent/cmd/agent/subsystem_handlers.go:782-813 - verifyBinarySignature() function
- Loads cached server public key
- Reads binary file
- Verifies Ed25519 signature
- Returns error if invalid

Update Handler:

aggregator-agent/cmd/agent/subsystem_handlers.go:346-495 - handleUpdateAgent()
- Validates nonce (line 397)
- Downloads binary (line 436)
- Verifies checksum (line 449)
- Verifies Ed25519 signature (line 456)
- Installs with atomic backup/rollback

❌ What's Missing (Workflow Level)

No Signed Packages in Database:

SELECT COUNT(*) FROM agent_update_packages;
-- Result: 0

No Signing Automation:
- Agent binaries are built during docker compose build (Dockerfile:19-28)
- Binaries exist at /app/binaries/{platform}/redflag-agent (10.8MB each)
- But they are never signed and inserted into the database
No UI for Signing:
- Setup wizard generates Ed25519 keypair ✅
- No interface to sign binaries ❌
- No interface to view signed packages ❌
- No interface to manage package versions ❌

Update Flow Fails:

Admin clicks "Update Agent"
  → POST /agents/:id/update
  → GetUpdatePackageByVersion(version, platform, arch)
  → Returns 404: "update package not found"
  → Update never starts

🔍 Manual Verification

To verify signing works, an admin would need to:

# 1. Get auth token
TOKEN=$(curl -X POST http://localhost:8080/api/v1/auth/login \
  -d '{"username":"admin","password":"<password>"}' | jq -r .token)

# 2. Sign the binary
curl -X POST http://localhost:8080/api/v1/updates/packages/sign \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "version": "0.1.23",
    "platform": "linux",
    "architecture": "amd64",
    "binary_path": "/app/binaries/linux-amd64/redflag-agent"
  }'

# 3. Verify in database
docker exec redflag-postgres psql -U redflag -d redflag \
  -c "SELECT version, platform, left(signature, 16) FROM agent_update_packages;"

Current Status: No documentation exists for this workflow.

2. Nonce-Based Replay Protection

✅ What's Implemented

Server Side:

aggregator-server/internal/api/handlers/agent_updates.go:86-99
```
nonceUUID := uuid.New()
nonceTimestamp := time.Now()
nonceSignature, err = h.signingService.SignNonce(nonceUUID, nonceTimestamp)
```
- Generates UUID + timestamp
- Signs with Ed25519 private key
- Includes in command parameters

Agent Side:

aggregator-agent/cmd/agent/subsystem_handlers.go:848-893 - validateNonce()
- Parses timestamp (line 851)
- Checks age < 5 minutes (line 857-860)
- Verifies Ed25519 signature against cached public key (line 887)
- Rejects expired or invalid nonces

Configuration:

Configurable via REDFLAG_NONCE_MAX_AGE_MINUTES (default: 5 minutes)

✅ Status: FULLY OPERATIONAL

Nonces are generated for every update command
Validation happens before download starts
Prevents replay attacks

3. Machine ID Binding

✅ What's Implemented

Server Side:

aggregator-server/internal/api/middleware/machine_binding.go:13-99
- Applied to all /agents/* endpoints (main.go:251)
- Validates X-Machine-ID header (line 58)
- Compares with database machine_id column (line 82)
- Returns HTTP 403 on mismatch (line 85-90)
- Enforces minimum agent version 0.1.22+ (line 42-54)

Agent Side:

aggregator-agent/internal/system/machine_id.go - GetMachineID()
- Linux: Uses /etc/machine-id or /var/lib/dbus/machine-id
- Windows: Uses registry HKLM\SOFTWARE\Microsoft\Cryptography\MachineGuid
- Cached in agent state
- Sent in X-Machine-ID header on every request

Database:

agents.machine_id column (VARCHAR(255), added in migration 016)
Stored during registration
Validated on every check-in

✅ Status: FULLY OPERATIONAL

Machine binding prevents config file copying to different machines
Logs security alerts: ⚠️ SECURITY ALERT: Agent ... machine ID mismatch!

⚠️ Known Issues:

No UI visibility: Admins can't see machine ID in dashboard
No recovery workflow: If machine ID changes (hardware swap), agent must re-register

4. Trust-On-First-Use (TOFU) Public Key

✅ What's Implemented

Server Endpoint:

aggregator-server/internal/api/handlers/system.go:22-32 - GetPublicKey()
- Returns Ed25519 public key in hex format
- Available at GET /api/v1/public-key
- Rate limited (public_access tier)

Agent Fetching:

aggregator-agent/cmd/agent/main.go:465-473

log.Println("Fetching server public key...")
if err := fetchAndCachePublicKey(cfg.ServerURL); err != nil {
    log.Printf("Warning: Failed to fetch server public key: %v", err)
    // Don't fail registration - key can be fetched later
}

Fetches during registration (line 467)
Caches to /etc/redflag/server_public_key (Linux) or C:\ProgramData\RedFlag\server_public_key (Windows)
Used for all signature verification

Agent Usage:

aggregator-agent/cmd/agent/subsystem_handlers.go:815-846 - getServerPublicKey()
- Loads from cache
- Used by verifyBinarySignature() (line 784)
- Used by validateNonce() (line 867)

⚠️ What's Broken

1. Non-Blocking Fetch (Critical):

main.go:468-470: If public key fetch fails, agent registers anyway
Agent cannot verify updates without public key
All update commands will fail signature verification
No retry mechanism

2. No Fingerprint Logging:

Agent doesn't log the server's public key fingerprint during TOFU
Admins have no way to verify correct server was contacted
Silent MITM vulnerability if wrong server URL provided

3. No Key Rotation Support:

Cached public key never expires
No mechanism to update if server rotates keys
Agent would need manual /etc/redflag/server_public_key deletion

5. Command Acknowledgment System

✅ What's Implemented

Agent Side:

aggregator-agent/internal/acknowledgment/tracker.go - Acknowledgment tracker
- Stores pending command results in pending_acks.json
- Tracks retry count (max 10 retries)
- Expires after 24 hours
- Sends acknowledgments in every check-in

Server Side:

aggregator-server/internal/database/queries/commands.go - VerifyCommandsCompleted()
- Returns AcknowledgedIDs in check-in response
- Agent removes acknowledged commands from pending list

Agent Main Loop:

aggregator-agent/cmd/agent/main.go:834-843

if response != nil && len(response.AcknowledgedIDs) > 0 {
    ackTracker.Acknowledge(response.AcknowledgedIDs)
    log.Printf("Server acknowledged %d command result(s)", len(response.AcknowledgedIDs))
}

✅ Status: FULLY OPERATIONAL

At-least-once delivery guarantee
Automatic retry on network failures
Cleanup after success or expiration

Critical Security Issues

Issue #1: Hardcoded Signing Key (High Severity)

Location: config/.env:24

REDFLAG_SIGNING_PRIVATE_KEY=1104a7fd7fb1a12b99e31d043fc7f4ef00bee6df19daff11ae4244606dac5bf9792d68d1c31f6c6a7820033720fb80d54bf22a8aab0382efd5deacc5122a5947

Public Key Fingerprint: 792d68d1c31f6c6a

Problem:

Same signing key appears across multiple test server instances
.env file is gitignored ✅ but manually copied between servers ❌
Setup wizard generates NEW keys, but if .env already has REDFLAG_SIGNING_PRIVATE_KEY, it's reused

Impact:

If one server is compromised, attacker can sign updates for ALL servers using this key
No uniqueness validation on server startup

Reproduction:

# Server A
grep REDFLAG_SIGNING_PRIVATE_KEY config/.env | sha256sum
# Output: abc123...

# Server B
grep REDFLAG_SIGNING_PRIVATE_KEY config/.env | sha256sum
# Output: abc123...  ← SAME KEY

Remediation:

Delete signing key from all .env files
Run setup wizard on each server to generate unique keys
Add startup validation to warn if key fingerprint matches known test keys
Document key generation in deployment guide

Issue #2: Update Signing Workflow Not Operational (Critical)

Problem:

Zero signed packages in database
No automation to sign binaries after build
No UI to trigger signing
Update commands fail with 404

Evidence:

redflag=# SELECT COUNT(*) FROM agent_update_packages;
 count
-------
     0

Impact:

Agent updates are completely non-functional
Security claims in documentation are misleading
Admin has no way to push signed updates

Required to Fix:

Signing Automation:
- Add post-build hook to sign binaries
- Store in database automatically
- Version management (which version is "latest"?)
Admin UI:
- Settings page: "Manage Update Packages"
- List signed packages with versions
- Button: "Sign Current Binaries"
- Show fingerprint of signing key in use
API Endpoints:
- GET /api/v1/updates/packages - List signed packages
- POST /api/v1/updates/packages/sign-all - Sign all binaries in /app/binaries/
- DELETE /api/v1/updates/packages/:id - Deactivate old package

Docker Build Integration:

# After building binaries, sign them
RUN go run scripts/sign-binaries.go \
  --private-key=$REDFLAG_SIGNING_PRIVATE_KEY \
  --binaries=/app/binaries

Issue #3: Public Key Fetch Non-Blocking (Medium Severity)

Location: aggregator-agent/cmd/agent/main.go:468-470

Problem:

if err := fetchAndCachePublicKey(cfg.ServerURL); err != nil {
    log.Printf("Warning: Failed to fetch server public key: %v", err)
    // Don't fail registration - key can be fetched later  ← PROBLEM
}

Impact:

Agent registers successfully without public key
Receives update commands
All updates fail signature verification
No automatic retry to fetch key

Remediation:

// Block update commands if no public key cached
func handleUpdateAgent(...) error {
    publicKey, err := getServerPublicKey()
    if err != nil {
        return fmt.Errorf("cannot process updates - server public key not cached: %w", err)
    }
    // ... proceed with update
}

Issue #4: No Fingerprint Verification (Medium Severity)

Problem:

Agent performs TOFU but doesn't log server's public key fingerprint
Admin has no visibility into which server the agent trusts
If wrong server URL provided, agent silently trusts wrong server

Remediation:

// After fetching public key
publicKey, err := crypto.FetchAndCacheServerPublicKey(serverURL)
if err != nil {
    return err
}

fingerprint := hex.EncodeToString(publicKey[:8])
log.Printf("✅ Server public key cached successfully")
log.Printf("📌 Server fingerprint: %s", fingerprint)
log.Printf("⚠️  Verify this fingerprint matches your server's expected value")

Issue #5: No Signing Service = Silent Failure (Low Severity)

Location: aggregator-server/internal/api/handlers/agent_updates.go:90-99

Problem:

if h.signingService != nil {
    nonceSignature, err = h.signingService.SignNonce(...)
}
// Falls through - creates command with EMPTY signature

Impact:

If REDFLAG_SIGNING_PRIVATE_KEY not set, server still sends update commands
Commands have empty nonce_signature field
Agent correctly rejects them
But admin has no visibility into why updates are failing

Remediation:

// Block update endpoints entirely if no signing service
if h.signingService == nil {
    c.JSON(http.StatusServiceUnavailable, gin.H{
        "error": "Agent updates are disabled - no signing key configured",
        "hint": "Generate Ed25519 keys in Settings → Security",
    })
    return
}

What Actually Works

✅ Components That Are Operational

Machine ID Binding: Fully functional, prevents config copying
Nonce Replay Protection: Fully functional, prevents command replay
Command Acknowledgment: Fully functional, reliable delivery
Ed25519 Signing (Code): Implementation is correct, just not wired up
Setup Wizard Key Generation: Works, generates unique Ed25519 keypairs

❌ Components That Are Broken

Agent Update Signing: No packages in database, updates fail
TOFU Failure Handling: Non-blocking, no retry
Fingerprint Verification: Agent doesn't log server fingerprint
Key Uniqueness: No validation against key reuse

Security Posture Assessment

Current State: 🔴 Not Production Ready

Strengths:

Well-designed architecture
Strong cryptographic primitives (Ed25519)
Defense-in-depth approach
Good separation of concerns

Weaknesses:

Critical: Agent updates completely non-functional
Critical: Signing key reuse across test instances
High: No UI/automation for signing workflow
Medium: Public key fetch can fail silently
Medium: No fingerprint verification for admins

Risk Analysis

If deployed to production:

Risk	Likelihood	Impact	Severity
Cannot push agent updates	100%	High	Critical
Signing key compromise affects all servers	Medium	Critical	High
Agent trusts wrong server (wrong URL)	Low	High	Medium
Agent registers without public key	Low	Medium	Low

Recommended Actions

Before claiming security features:

Complete update signing workflow (UI + automation)
Test end-to-end agent update with signature verification
Add fingerprint logging and verification
Document key generation and unique-per-server requirements
Add integration tests for signing workflow

Immediate fixes (can be done now):

Block update commands if no public key cached
Block update endpoints if no signing service configured
Log server fingerprint during TOFU
Add warning on server startup if signing key missing

Documentation Gaps

Missing Documentation

Agent Update Workflow:
- How to sign binaries
- How to push updates to agents
- How to verify signatures manually
- Rollback procedures
Key Management:
- How to generate unique keys per server
- How to rotate keys safely
- How to verify key uniqueness
- Backup/recovery procedures
Security Model:
- TOFU trust model explanation
- Attack scenarios and mitigations
- Threat model documentation
- Security assumptions
Operational Procedures:
- Agent registration verification
- Machine ID troubleshooting
- Signature verification debugging
- Security incident response

Conclusion

RedFlag has excellent security infrastructure code, but the operational workflow is incomplete. The signing system exists but is not connected to the update delivery system. This makes it impossible to push signed updates to agents, rendering the security architecture non-functional.

Key Findings:

✅ All security primitives are correctly implemented
✅ Code quality is high, cryptography is sound
❌ No signed packages exist in database
❌ No UI or automation for signing workflow
❌ Agent updates are currently broken

Recommendation: Either complete the signing workflow implementation or remove security claims from documentation until operational.

Next Steps

Option 1: Complete Implementation

Add signing automation (post-build hook)
Build admin UI for package management
Add integration tests
Document operational procedures
Estimated effort: 2-3 days

Option 2: Document As-Is

Update README to clarify "security infrastructure in progress"
Document manual signing procedure
Add warning that updates require manual intervention
Estimated effort: 2 hours

Option 3: Temporary Workaround

Add script to sign all binaries on container startup
Populate database automatically
Document as "alpha security model"
Estimated effort: 4 hours

17 KiB Raw Permalink Blame History

RedFlag Security Architecture Audit

Executive Summary

Security Components - Detailed Analysis

1. Ed25519 Digital Signatures

✅ What's Implemented (Code Level)

❌ What's Missing (Workflow Level)

🔍 Manual Verification

2. Nonce-Based Replay Protection

✅ What's Implemented

✅ Status: FULLY OPERATIONAL

3. Machine ID Binding

✅ What's Implemented

✅ Status: FULLY OPERATIONAL

⚠️ Known Issues:

4. Trust-On-First-Use (TOFU) Public Key

✅ What's Implemented

⚠️ What's Broken

5. Command Acknowledgment System

✅ What's Implemented

✅ Status: FULLY OPERATIONAL

Critical Security Issues

Issue #1: Hardcoded Signing Key (High Severity)

Issue #2: Update Signing Workflow Not Operational (Critical)

Issue #3: Public Key Fetch Non-Blocking (Medium Severity)

Issue #4: No Fingerprint Verification (Medium Severity)

Issue #5: No Signing Service = Silent Failure (Low Severity)

What Actually Works

✅ Components That Are Operational

❌ Components That Are Broken

Security Posture Assessment

Current State: 🔴 Not Production Ready

Risk Analysis

Recommended Actions

Documentation Gaps

Missing Documentation

Conclusion

Next Steps

Option 1: Complete Implementation

Option 2: Document As-Is

Option 3: Temporary Workaround

17 KiB

Raw Permalink Blame History