89 lines
5.1 KiB
Markdown
89 lines
5.1 KiB
Markdown
# RedFlag System Architecture
|
|
|
|
## 1. Overview
|
|
|
|
RedFlag is a cross-platform update management system designed for homelabs and self-hosters. It provides centralized visibility and control over software updates across multiple machines and platforms through a secure, resilient, pull-based architecture.
|
|
|
|
## 2. System Architecture Diagram
|
|
|
|
(Diagram sourced from `docs/days/October/ARCHITECTURE.md`, as it remains accurate)
|
|
|
|
```
|
|
┌─────────────────┐
|
|
│ Web Dashboard │ React + TypeScript
|
|
│ Port: 3000 │
|
|
└────────┬────────┘
|
|
│ HTTPS + JWT Auth
|
|
┌────────▼────────┐
|
|
│ Server (Go) │ PostgreSQL
|
|
│ Port: 8080 │
|
|
└────────┬────────┘
|
|
│ Pull-based (agents check in every 5 min)
|
|
┌────┴────┬────────┐
|
|
│ │ │
|
|
┌───▼──┐ ┌──▼──┐ ┌──▼───┐
|
|
│Linux │ │Windows│ │Linux │
|
|
│Agent │ │Agent │ │Agent │
|
|
└──────┘ └───────┘ └──────┘
|
|
```
|
|
|
|
## 3. Core Components
|
|
|
|
### 3.1. Server (`redflag-server`)
|
|
|
|
* **Framework**: Go + Gin HTTP framework.
|
|
* **Database**: PostgreSQL.
|
|
* **Authentication**: Multi-tier token system (Registration Tokens, JWT Access Tokens, Refresh Tokens).
|
|
* **Security**: Enforces Machine ID Binding, Nonce Protection, and Ed25519 Binary Signing.
|
|
* **Scheduler**: A priority-queue scheduler (not cron) manages agent tasks with backpressure detection.
|
|
|
|
### 3.2. Agent (`redflag-agent`)
|
|
|
|
* **Language**: Go (single binary, cross-platform).
|
|
* **Services**: Deploys as a native service (`systemd` on Linux, Windows Services on Windows).
|
|
* **Paths (Linux):**
|
|
* **Config:** `/etc/redflag/config.json`
|
|
* **State:** `/var/lib/redflag/`
|
|
* **Binary:** `/usr/local/bin/redflag-agent`
|
|
* **Resilience:**
|
|
* Uses a **Circuit Breaker** to prevent cascading failures from individual scanners.
|
|
* Uses a **Command Acknowledgment System** (`pending_acks.json`) to guarantee at-least-once delivery of results, even if the agent restarts.
|
|
* Designed with a **Retry/Backoff Architecture** to handle server (502) and network failures.
|
|
* **Scanners:**
|
|
* Linux: APT, DNF, Docker
|
|
* Windows: Windows Update, Winget
|
|
|
|
### 3.3. Web Dashboard (`aggregator-web`)
|
|
|
|
* **Framework**: React with TypeScript.
|
|
* **Function**: Provides the "single pane of glass" for viewing agents, approving updates, and monitoring system health.
|
|
* **Security**: Communicates with the server via an authenticated JWT, with sessions managed by `HttpOnly` cookies.
|
|
|
|
## 4. Core Workflows
|
|
|
|
### 4.1. Agent Installation & Migration
|
|
|
|
The installer script is **idempotent**.
|
|
|
|
1. **New Install:** A `curl` or `iwr` one-liner is run with a `registration_token`. The script downloads the `redflag-agent` binary, creates the `redflag-agent` user, sets up the native service, and registers with the server, consuming one "seat" from the token.
|
|
2. **Upgrade/Re-install:** If the installer script is re-run, it detects an *existing* `config.json`. It skips registration, preserving the agent's ID and history. It then stops the service, atomically replaces the binary, and restarts the service.
|
|
3. **Automatic Migration:** On first start, the agent runs a **MigrationExecutor** to detect old installations (e.g., from `/etc/aggregator/`). It creates a backup, moves files to the new `/etc/redflag/` paths, and automatically enables new security features like machine binding.
|
|
|
|
### 4.2. Agent Check-in & Command Loop (Pull-Only)
|
|
|
|
1. **Check-in:** The agent checks in every 5 minutes (with jitter) to `GET /agents/:id/commands`.
|
|
2. **Metrics:** This check-in *piggybacks* lightweight metrics (CPU/Mem/Disk) and any pending command acknowledgments.
|
|
3. **Commands:** The server returns any pending commands (e.g., `scan_updates`, `enable_heartbeat`).
|
|
4. **Execute:** The agent executes the commands.
|
|
5. **Report:** The agent reports results back to the server. The **Command Acknowledgment System** ensures this result is delivered, even if the agent crashes or restarts.
|
|
|
|
### 4.3. Secure Agent Update (The "SSoT" Workflow)
|
|
|
|
1. **Build (Server):** The server maintains a set of generic, *unsigned* agent binaries for each platform (linux/amd64, etc.).
|
|
2. **Sign (Server):** When an update is triggered, the **Build Orchestrator** signs the generic binary *once per version/platform* using its Ed25519 private key. This signed package metadata is stored in the `agent_update_packages` table.
|
|
3. **Authorize (Server):** The server generates a one-time-use, time-limited (`<5 min`) **Ed25519 Nonce** and sends it to the agent as part of the `update_agent` command.
|
|
4. **Verify (Agent):** The agent receives the command and:
|
|
a. Validates the **Nonce** (signature and timestamp) to prevent replay attacks.
|
|
b. Downloads the new binary.
|
|
c. Validates the **Binary's Signature** against the server public key it cached during its first registration (TOFU model).
|
|
5. **Install (Agent):** If all checks pass, the agent atomically replaces its old binary and restarts. |