Redflag/docs/2_ARCHITECTURE/Overview.md

# RedFlag System Architecture

## 1. Overview

RedFlag is a cross-platform update management system designed for homelabs and self-hosters. It provides centralized visibility and control over software updates across multiple machines and platforms through a secure, resilient, pull-based architecture.

## 2. System Architecture Diagram

(Diagram sourced from `docs/days/October/ARCHITECTURE.md`, as it remains accurate)

```
┌─────────────────┐
│  Web Dashboard  │  React + TypeScript
│  Port: 3000     │
└────────┬────────┘
│ HTTPS + JWT Auth
┌────────▼────────┐
│  Server (Go)    │  PostgreSQL
│  Port: 8080     │
└────────┬────────┘
│ Pull-based (agents check in every 5 min)
┌────┴────┬────────┐
│         │        │
┌───▼──┐  ┌──▼──┐  ┌──▼───┐
│Linux │  │Windows│ │Linux │
│Agent │  │Agent  │ │Agent │
└──────┘  └───────┘ └──────┘
```

## 3. Core Components

### 3.1. Server (`redflag-server`)

* **Framework**: Go + Gin HTTP framework.
* **Database**: PostgreSQL.
* **Authentication**: Multi-tier token system (Registration Tokens, JWT Access Tokens, Refresh Tokens).
* **Security**: Enforces Machine ID Binding, Nonce Protection, and Ed25519 Binary Signing.
* **Scheduler**: A priority-queue scheduler (not cron) manages agent tasks with backpressure detection.

### 3.2. Agent (`redflag-agent`)

* **Language**: Go (single binary, cross-platform).
* **Services**: Deploys as a native service (`systemd` on Linux, Windows Services on Windows).
* **Paths (Linux):**
    * **Config:** `/etc/redflag/config.json`
    * **State:** `/var/lib/redflag/`
    * **Binary:** `/usr/local/bin/redflag-agent`
* **Resilience:**
    * Uses a **Circuit Breaker** to prevent cascading failures from individual scanners.
    * Uses a **Command Acknowledgment System** (`pending_acks.json`) to guarantee at-least-once delivery of results, even if the agent restarts.
    * Designed with a **Retry/Backoff Architecture** to handle server (502) and network failures.
* **Scanners:**
    * Linux: APT, DNF, Docker
    * Windows: Windows Update, Winget

### 3.3. Web Dashboard (`aggregator-web`)

* **Framework**: React with TypeScript.
* **Function**: Provides the "single pane of glass" for viewing agents, approving updates, and monitoring system health.
* **Security**: Communicates with the server via an authenticated JWT, with sessions managed by `HttpOnly` cookies.

## 4. Core Workflows

### 4.1. Agent Installation & Migration

The installer script is **idempotent**.

1.  **New Install:** A `curl` or `iwr` one-liner is run with a `registration_token`. The script downloads the `redflag-agent` binary, creates the `redflag-agent` user, sets up the native service, and registers with the server, consuming one "seat" from the token.
2.  **Upgrade/Re-install:** If the installer script is re-run, it detects an *existing* `config.json`. It skips registration, preserving the agent's ID and history. It then stops the service, atomically replaces the binary, and restarts the service.
3.  **Automatic Migration:** On first start, the agent runs a **MigrationExecutor** to detect old installations (e.g., from `/etc/aggregator/`). It creates a backup, moves files to the new `/etc/redflag/` paths, and automatically enables new security features like machine binding.

### 4.2. Agent Check-in & Command Loop (Pull-Only)

1.  **Check-in:** The agent checks in every 5 minutes (with jitter) to `GET /agents/:id/commands`.
2.  **Metrics:** This check-in *piggybacks* lightweight metrics (CPU/Mem/Disk) and any pending command acknowledgments.
3.  **Commands:** The server returns any pending commands (e.g., `scan_updates`, `enable_heartbeat`).
4.  **Execute:** The agent executes the commands.
5.  **Report:** The agent reports results back to the server. The **Command Acknowledgment System** ensures this result is delivered, even if the agent crashes or restarts.

### 4.3. Secure Agent Update (The "SSoT" Workflow)

1.  **Build (Server):** The server maintains a set of generic, *unsigned* agent binaries for each platform (linux/amd64, etc.).
2.  **Sign (Server):** When an update is triggered, the **Build Orchestrator** signs the generic binary *once per version/platform* using its Ed25519 private key. This signed package metadata is stored in the `agent_update_packages` table.
3.  **Authorize (Server):** The server generates a one-time-use, time-limited (`<5 min`) **Ed25519 Nonce** and sends it to the agent as part of the `update_agent` command.
4.  **Verify (Agent):** The agent receives the command and:
    a. Validates the **Nonce** (signature and timestamp) to prevent replay attacks.
    b. Downloads the new binary.
    c. Validates the **Binary's Signature** against the server public key it cached during its first registration (TOFU model).
5.  **Install (Agent):** If all checks pass, the agent atomically replaces its old binary and restarts.