# RedFlag Security Architecture

This document outlines the security architecture and implementation details for RedFlag's Ed25519-based cryptographic update system.

## Overview

RedFlag implements a defense-in-depth security model for agent updates using:
- **Ed25519 Digital Signatures** for binary authenticity
- **Runtime Public Key Distribution** via Trust-On-First-Use (TOFU)
- **Nonce-based Replay Protection** for command freshness (<5min freshness)
- **Atomic Update Process** with automatic rollback and watchdog

## Architecture Overview

```mermaid
graph TB
    A[Server Signs Package] --> B[Ed25519 Signature]
    B --> C[Package Distribution]
    C --> D[Agent Downloads]
    D --> E[Signature Verification]
    E --> F[AES-256-GCM Decryption]
    F --> G[Checksum Validation]
    G --> H[Atomic Installation]
    H --> I[Service Restart]
    I --> J[Update Confirmation]

    subgraph "Security Layers"
        K[Nonce Validation]
        L[Signature Verification]
        M[Encryption]
        N[Checksum Validation]
    end
```

## Threat Model

### Protected Against
- **Package Tampering**: Ed25519 signatures prevent unauthorized modifications
- **Replay Attacks**: Nonce-based validation ensures command freshness (< 5 minutes)
- **Eavesdropping**: AES-256-GCM encryption protects transit
- **Rollback Protection**: Version-based updates prevent downgrade attacks
- **Privilege Escalation**: Atomic updates with proper file permissions

### Assumptions
- Server private key is securely stored and protected
- Agent system has basic file system protections
- Network transport uses HTTPS/TLS
- Initial agent registration is secure

## Cryptographic Operations

### Key Generation (Server Setup)

```bash
# Generate Ed25519 key pair for RedFlag
go run scripts/generate-keypair.go

# Output:
# REDFLAG_SIGNING_PRIVATE_KEY=c038751ba992c9335501a0853b83e93190021075...
# REDFLAG_PUBLIC_KEY=37f6d2a4ffe0f83bcb91d0ee2eb266833f766e8180866d31...

# Add the private key to server environment
# (Public key is distributed to agents automatically via API)
```

### Package Signing Flow

```mermaid
sequenceDiagram
    participant S as Server
    participant PKG as Update Package
    participant A as Agent

    S->>PKG: 1. Generate Package
    S->>PKG: 2. Calculate SHA-256 Checksum
    S->>PKG: 3. Sign with Ed25519 Private Key
    S->>PKG: 4. Add Metadata (version, platform, etc.)
    S->>PKG: 5. Encrypt with AES-256-GCM (optional)
    PKG->>A: 6. Distribute Package

    A->>A: 7. Verify Signature
    A->>A: 8. Validate Nonce (< 5min)
    A->>A: 9. Decrypt Package (if encrypted)
    A->>A: 10. Verify Checksum
    A->>A: 11. Atomic Installation
    A->>S: 12. Update Confirmation
```

## Implementation Details

### 1. Ed25519 Signature System

#### Server-side (signing.go)
```go
// SignFile creates Ed25519 signature for update packages
func (s *SigningService) SignFile(filePath string) (*models.AgentUpdatePackage, error) {
    content, err := io.ReadAll(file)
    hash := sha256.Sum256(content)
    signature := ed25519.Sign(s.privateKey, content)

    return &models.AgentUpdatePackage{
        Signature: hex.EncodeToString(signature),
        Checksum:  hex.EncodeToString(hash[:]),
        // ... other metadata
    }, nil
}

// VerifySignature validates package authenticity
func (s *SigningService) VerifySignature(content []byte, signatureHex string) (bool, error) {
    signature, _ := hex.DecodeString(signatureHex)
    return ed25519.Verify(s.publicKey, content, signature), nil
}
```

#### Agent-side (subsystem_handlers.go)
```go
// Fetch and cache public key at agent startup
publicKey, err := crypto.FetchAndCacheServerPublicKey(serverURL)
// Cached to /etc/aggregator/server_public_key

// Signature verification during update
signature, _ := hex.DecodeString(params["signature"].(string))
if valid := ed25519.Verify(publicKey, packageContent, signature); !valid {
    return fmt.Errorf("invalid package signature")
}
```

### Public Key Distribution (TOFU Model)

#### Server provides public key via API
```go
// GET /api/v1/public-key (no authentication required)
{
  "public_key": "37f6d2a4ffe0f83bcb91d0ee2eb266833f766e8180866d31...",
  "fingerprint": "37f6d2a4ffe0f83b",
  "algorithm": "ed25519",
  "key_size": 32
}
```

#### Agent fetches and caches at startup
```go
// During agent registration
publicKey, err := crypto.FetchAndCacheServerPublicKey(serverURL)
// Cached to /etc/aggregator/server_public_key for future use
```

**Security Model**: Trust-On-First-Use (TOFU)
- Like SSH fingerprints - trust the first connection
- Protected by HTTPS/TLS during initial fetch
- Cached locally for all future verifications
- Optional: Manual fingerprint verification (out-of-band)

### 2. Nonce-Based Replay Protection

#### Server-side Nonce Generation
```go
// Generate and sign nonce for update command
func (s *SigningService) SignNonce(nonceUUID uuid.UUID, timestamp time.Time) (string, error) {
    nonceData := fmt.Sprintf("%s:%d", nonceUUID.String(), timestamp.Unix())
    signature := ed25519.Sign(s.privateKey, []byte(nonceData))
    return hex.EncodeToString(signature), nil
}

// Verify nonce freshness and signature
func (s *SigningService) VerifyNonce(nonceUUID uuid.UUID, timestamp time.Time,
    signatureHex string, maxAge time.Duration) (bool, error) {
    if time.Since(timestamp) > maxAge {
        return false, fmt.Errorf("nonce expired")
    }
    // ... signature verification
}
```

#### Agent-side Validation
```go
// Extract nonce parameters from command
nonceUUIDStr := params["nonce_uuid"].(string)
nonceTimestampStr := params["nonce_timestamp"].(string)
nonceSignature := params["nonce_signature"].(string)

// TODO: Implement full validation
// - Parse timestamp
// - Verify < 5min freshness
// - Verify Ed25519 signature
// - Prevent replay attacks
```

### 3. AES-256-GCM Encryption

#### Key Derivation
```go
// Derive AES-256 key from nonce
func deriveKeyFromNonce(nonce string) []byte {
    hash := sha256.Sum256([]byte(nonce))
    return hash[:] // 32 bytes for AES-256
}
```

#### Decryption Process
```go
// Decrypt update package with AES-256-GCM
func decryptAES256GCM(encryptedData, nonce string) ([]byte, error) {
    key := deriveKeyFromNonce(nonce)
    data, _ := hex.DecodeString(encryptedData)

    block, _ := aes.NewCipher(key)
    gcm, _ := cipher.NewGCM(block)

    // Extract nonce and ciphertext
    nonceSize := gcm.NonceSize()
    nonceBytes, ciphertext := data[:nonceSize], data[nonceSize:]

    // Decrypt and verify
    return gcm.Open(nil, nonceBytes, ciphertext, nil)
}
```

## Update Process Flow

### 1. Server Startup
1. **Load Private Key**: From `REDFLAG_SIGNING_PRIVATE_KEY` environment variable
2. **Initialize Signing Service**: Ed25519 operations ready
3. **Serve Public Key**: Available at `GET /api/v1/public-key`

### 2. Agent Installation (One-Liner)
```bash
curl -sSL https://redflag.example/install.sh | bash
```
1. **Download Agent**: Pre-built binary from server
2. **Start Agent**: Automatic startup
3. **Register**: Agent ↔ Server authentication
4. **Fetch Public Key**: From `GET /api/v1/public-key`
5. **Cache Key**: Saved to `/etc/aggregator/server_public_key`

### 3. Package Preparation (Server)
1. **Build**: Compile agent binary for target platform
2. **Sign**: Create Ed25519 signature using server private key
3. **Store**: Persist package with signature + metadata in database

### 4. Command Distribution (Server → Agent)
1. **Generate Nonce**: Create UUID + timestamp for freshness (<5min)
2. **Sign Nonce**: Ed25519 sign nonce for authenticity
3. **Create Command**: Bundle update parameters with signed nonce
4. **Distribute**: Send command to target agents

### 5. Package Reception (Agent)
1. **Validate Nonce**: Check timestamp < 5 minutes, verify Ed25519 signature
2. **Download**: Fetch package from secure URL
3. **Verify Signature**: Validate Ed25519 signature against cached public key
4. **Verify Checksum**: SHA-256 integrity check

### 6. Atomic Installation (Agent)
1. **Backup**: Copy current binary to `.bak`
2. **Install**: Atomically replace with new binary
3. **Restart**: Restart agent service (systemd/service/Windows service)
4. **Watchdog**: Poll server every 15s for version confirmation (5min timeout)
5. **Confirm or Rollback**:
   - ✓ Success → cleanup backup
   - ✗ Timeout/Failure → automatic rollback from backup

## Security Best Practices

### Server Operations
- ✅ Private key stored in secure environment (hardware security module recommended)
- ✅ Regular key rotation (see TODO in signing.go)
- ✅ Audit logging of all signing operations
- ✅ Network access controls for signing endpoints

### Agent Operations
- ✅ Public key fetched via TOFU (Trust-On-First-Use)
- ✅ Nonce validation prevents replay attacks (<5min freshness)
- ✅ Signature verification prevents tampering
- ✅ Watchdog polls server for version confirmation
- ✅ Atomic updates prevent partial installations
- ✅ Automatic rollback on watchdog timeout/failure

### Network Security
- ✅ HTTPS/TLS for all communications
- ✅ Package integrity verification
- ✅ Timeout controls for downloads
- ✅ Rate limiting on update endpoints

## Key Rotation Strategy

### Planned Implementation (TODO)

```mermaid
graph LR
    A[Key v1 Active] --> B[Generate Key v2]
    B --> C[Dual-Key Period]
    C --> D[Sign with v1+v2]
    D --> E[Phase out v1]
    E --> F[Key v2 Active]
```

### Rotation Steps
1. **Generate**: Create new Ed25519 key pair (v2)
2. **Distribute**: Add v2 public key to agents
3. **Transition**: Sign packages with both v1 and v2
4. **Verify**: Agents accept signatures from either key
5. **Phase-out**: Gradually retire v1
6. **Cleanup**: Remove v1 from agent trust store

### Migration Considerations
- Backward compatibility during transition
- Graceful period for key rotation (30 days recommended)
- Monitoring for rotation completion
- Emergency rollback procedures

## Vulnerability Management

### Known Mitigations
- **Supply Chain**: Ed25519 signatures prevent package tampering
- **Replay Attacks**: Nonce validation ensures freshness
- **Privilege Escalation**: Atomic updates with proper permissions
- **Information Disclosure**: AES-256-GCM encryption for transit

### Security Monitoring
- Monitor for failed signature verifications
- Alert on nonce replay attempts
- Track update success/failure rates
- Audit signing service access logs

### Incident Response
1. **Compromise Detection**: Monitor for signature verification failures
2. **Key Rotation**: Immediate rotation if private key compromised
3. **Agent Update**: Force security updates to all agents
4. **Investigation**: Audit logs for unauthorized access

## Compliance Considerations

- **Cryptography**: Uses FIPS-validated algorithms (Ed25519, AES-256-GCM, SHA-256)
- **Audit Trail**: Complete logging of all signing and update operations
- **Access Control**: Role-based access to signing infrastructure
- **Data Protection**: Encryption in transit and at rest

## Future Enhancements

### Planned Security Features
- [ ] Hardware Security Module (HSM) integration for private key protection
- [ ] Certificate-based agent authentication
- [ ] Mutual TLS for server-agent communication
- [ ] Package reputation scoring
- [ ] Zero-knowledge proof-based update verification

### Performance Optimizations
- [ ] Parallel signature verification
- [ ] Cached public key validation
- [ ] Optimized crypto operations
- [ ] Delta update support

## Testing and Validation

### Security Testing
- **Unit Tests**: 80% coverage for crypto operations
- **Integration Tests**: Full update cycle simulation
- **Penetration Testing**: Regular third-party security assessments
- **Fuzz Testing**: Cryptographic input validation

### Test Scenarios
1. **Valid Update**: Normal successful update flow
2. **Invalid Signature**: Tampered package rejection
3. **Expired Nonce**: Replay attack prevention
4. **Corrupted Package**: Checksum validation
5. **Service Failure**: Automatic rollback
6. **Network Issues**: Timeout and retry handling

## References

- [Ed25519 Specification](https://tools.ietf.org/html/rfc8032)
- [AES-GCM Specification](https://tools.ietf.org/html/rfc5116)
- [NIST Cryptographic Standards](https://csrc.nist.gov/projects/cryptographic-standards-and-guidelines)

## Reporting Security Issues

Please report security vulnerabilities responsibly:
- Email: security@redflag-project.org
- PGP Key: Available on request
- Response time: Within 48 hours

---

*Last updated: v0.1.21*
*Security classification: Internal use*