Files
Redflag/docs/4_LOG/October_2025/Architecture-Documentation/PROXMOX_INTEGRATION_SPEC.md

15 KiB

🔧 Proxmox Integration Specification

Status: Planning / Specification Priority: HIGH (KILLER FEATURE) Target Session: Session 9 Estimated Effort: 8-12 hours


📋 Overview

Proxmox integration enables RedFlag to automatically discover and manage LXC containers across Proxmox clusters, providing hierarchical update management for the complete homelab stack: Proxmox hosts → LXC containers → Docker containers.

User Problem

Current Pain:

User has: 2 Proxmox clusters
  → 10+ LXC containers
  → 20+ Docker containers inside LXCs
  → Manual SSH into each LXC to check updates
  → No centralized view
  → Time-consuming, error-prone

RedFlag Solution:

1. Add Proxmox API credentials to RedFlag
2. Auto-discover all LXCs across clusters
3. Auto-install agent in each LXC
4. Hierarchical dashboard: see everything at once
5. Bulk operations: "Update all LXCs on node01"

🎯 Core Features

1. Proxmox Cluster Discovery

User Flow:

  1. User navigates to Settings → Proxmox Integration
  2. Clicks "Add Proxmox Cluster"
  3. Enters:
    • Cluster name (e.g., "Homelab Cluster 1")
    • API URL (e.g., https://192.168.1.10:8006)
    • API Token ID (e.g., root@pam!redflag)
    • API Token Secret
  4. Clicks "Test Connection" → validates credentials
  5. Clicks "Save & Discover"
  6. RedFlag queries Proxmox API:
    • Lists all nodes in cluster
    • Lists all LXCs on each node
    • Displays summary: "Found 2 nodes, 10 LXCs"
  7. User reviews discovered LXCs
  8. Clicks "Install Agents" → automated deployment

2. LXC Auto-Discovery

Proxmox API Endpoints:

# List all nodes
GET /api2/json/nodes

# List LXCs on a node
GET /api2/json/nodes/{node}/lxc

# Get LXC details
GET /api2/json/nodes/{node}/lxc/{vmid}/status/current

# Execute command in LXC
POST /api2/json/nodes/{node}/lxc/{vmid}/exec

Data to Collect:

{
  "vmid": 100,
  "name": "ubuntu-docker-01",
  "node": "pve1",
  "status": "running",
  "maxmem": 2147483648,
  "maxdisk": 8589934592,
  "uptime": 123456,
  "ostemplate": "ubuntu-22.04-standard",
  "ip_address": "192.168.1.100",
  "hostname": "ubuntu-docker-01.local"
}

3. Automated Agent Installation

Installation Flow:

# 1. Generate agent install script for this LXC
/tmp/redflag-agent-install.sh

# 2. Upload script to LXC
pct push <vmid> /tmp/redflag-agent-install.sh /tmp/install.sh

# 3. Execute installation
pct exec <vmid> -- bash /tmp/install.sh

# Script contents:
#!/bin/bash
# Download agent binary
curl -fsSL https://redflag-server:8080/agent/download -o /usr/local/bin/redflag-agent

# Make executable
chmod +x /usr/local/bin/redflag-agent

# Register with server
/usr/local/bin/redflag-agent --register \
  --server https://redflag-server:8080 \
  --proxmox-cluster "Homelab Cluster 1" \
  --lxc-vmid 100 \
  --lxc-node pve1

# Create systemd service
cat > /etc/systemd/system/redflag-agent.service <<'EOF'
[Unit]
Description=RedFlag Update Agent
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/redflag-agent
Restart=always

[Install]
WantedBy=multi-user.target
EOF

# Enable and start
systemctl daemon-reload
systemctl enable redflag-agent
systemctl start redflag-agent

4. Hierarchical Dashboard View

Dashboard Structure:

Proxmox Integration
├── Homelab Cluster 1 (192.168.1.10)
│   ├── Node: pve1
│   │   ├── LXC 100: ubuntu-docker-01 [✓ Online] [3 updates]
│   │   │   ├── APT Packages: 2 updates
│   │   │   └── Docker Images: 1 update
│   │   │       └── nginx:latest → sha256:abc123
│   │   ├── LXC 101: debian-pihole [✓ Online] [1 update]
│   │   └── LXC 102: ubuntu-dev [✗ Offline]
│   └── Node: pve2
│       ├── LXC 200: nextcloud [✓ Online] [5 updates]
│       └── LXC 201: mariadb [✓ Online] [0 updates]
└── Homelab Cluster 2 (192.168.2.10)
    └── Node: pve3
        └── LXC 300: monitoring [✓ Online] [2 updates]

Actions:
[Scan All]  [Update All]  [View by Update Type]

5. Bulk Operations

Supported Operations:

  • By Cluster: "Scan all LXCs in Homelab Cluster 1"
  • By Node: "Update all LXCs on pve1"
  • By Type: "Update all Docker images across all LXCs"
  • By Severity: "Install all critical security updates"

UI Flow:

1. User selects hierarchy level (cluster/node/LXC)
2. Right-click → Context menu
3. Options:
   - Scan for updates
   - Approve all updates
   - Install all updates
   - View detailed status
   - Restart all agents

🗄️ Database Schema

New Tables

-- Proxmox cluster configuration
CREATE TABLE proxmox_clusters (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    name VARCHAR(255) NOT NULL,
    api_url VARCHAR(255) NOT NULL,
    api_token_id VARCHAR(255) NOT NULL,
    api_token_secret_encrypted TEXT NOT NULL, -- Encrypted with server key
    last_discovered TIMESTAMP,
    status VARCHAR(50) DEFAULT 'active', -- active, error, disabled
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

-- Proxmox nodes (hosts)
CREATE TABLE proxmox_nodes (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    cluster_id UUID REFERENCES proxmox_clusters(id) ON DELETE CASCADE,
    node_name VARCHAR(255) NOT NULL,
    status VARCHAR(50), -- online, offline, unknown
    cpu_count INTEGER,
    memory_total BIGINT,
    uptime BIGINT,
    pve_version VARCHAR(50),
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    UNIQUE(cluster_id, node_name)
);

-- LXC containers
CREATE TABLE lxc_containers (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    node_id UUID REFERENCES proxmox_nodes(id) ON DELETE CASCADE,
    agent_id UUID REFERENCES agents(id) ON DELETE SET NULL,
    vmid INTEGER NOT NULL,
    container_name VARCHAR(255),
    hostname VARCHAR(255),
    ip_address INET,
    os_template VARCHAR(255),
    status VARCHAR(50), -- running, stopped, unknown
    memory_max BIGINT,
    disk_max BIGINT,
    uptime BIGINT,
    agent_installed BOOLEAN DEFAULT FALSE,
    last_seen TIMESTAMP,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    UNIQUE(node_id, vmid)
);

-- Discovery log
CREATE TABLE proxmox_discovery_log (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    cluster_id UUID REFERENCES proxmox_clusters(id) ON DELETE CASCADE,
    discovered_at TIMESTAMP DEFAULT NOW(),
    nodes_found INTEGER,
    lxcs_found INTEGER,
    new_lxcs INTEGER,
    errors TEXT,
    duration_seconds INTEGER
);

-- Indexes
CREATE INDEX idx_lxc_containers_agent_id ON lxc_containers(agent_id);
CREATE INDEX idx_lxc_containers_node_id ON lxc_containers(node_id);
CREATE INDEX idx_proxmox_nodes_cluster_id ON proxmox_nodes(cluster_id);

Schema Relationships

proxmox_clusters (1) → (N) proxmox_nodes
proxmox_nodes (1) → (N) lxc_containers
lxc_containers (1) → (1) agents
agents (1) → (N) update_packages
lxc_containers (1) → (N) docker_containers (via agents)

🔧 Implementation Plan

Phase 1: API Client (Session 9a - 3 hours)

File: aggregator-server/internal/integrations/proxmox/client.go

package proxmox

import (
    "context"
    "crypto/tls"
    "encoding/json"
    "fmt"
    "net/http"
)

type Client struct {
    baseURL    string
    tokenID    string
    tokenSecret string
    httpClient *http.Client
}

// NewClient creates a Proxmox API client
func NewClient(apiURL, tokenID, tokenSecret string, skipTLS bool) *Client {
    transport := &http.Transport{
        TLSClientConfig: &tls.Config{InsecureSkipVerify: skipTLS},
    }

    return &Client{
        baseURL:    apiURL,
        tokenID:    tokenID,
        tokenSecret: tokenSecret,
        httpClient: &http.Client{
            Transport: transport,
            Timeout:   30 * time.Second,
        },
    }
}

// TestConnection verifies API credentials
func (c *Client) TestConnection(ctx context.Context) error {
    // GET /api2/json/version
    // Returns Proxmox VE version info
}

// ListNodes returns all nodes in the cluster
func (c *Client) ListNodes(ctx context.Context) ([]Node, error) {
    // GET /api2/json/nodes
}

// ListLXCs returns all LXC containers on a node
func (c *Client) ListLXCs(ctx context.Context, nodeName string) ([]LXC, error) {
    // GET /api2/json/nodes/{node}/lxc
}

// GetLXCStatus returns detailed status of an LXC
func (c *Client) GetLXCStatus(ctx context.Context, nodeName string, vmid int) (*LXCStatus, error) {
    // GET /api2/json/nodes/{node}/lxc/{vmid}/status/current
}

// ExecInLXC executes a command in an LXC container
func (c *Client) ExecInLXC(ctx context.Context, nodeName string, vmid int, command string) (string, error) {
    // POST /api2/json/nodes/{node}/lxc/{vmid}/exec
    // Returns task ID, need to poll for results
}

// UploadFileToLXC uploads a file to an LXC
func (c *Client) UploadFileToLXC(ctx context.Context, nodeName string, vmid int, localPath, remotePath string) error {
    // Uses pct push via exec
}

Phase 2: Discovery Service (Session 9b - 3 hours)

File: aggregator-server/internal/services/proxmox_discovery.go

package services

type ProxmoxDiscoveryService struct {
    db *database.DB
    proxmoxClients map[string]*proxmox.Client
}

// DiscoverCluster discovers all nodes and LXCs in a Proxmox cluster
func (s *ProxmoxDiscoveryService) DiscoverCluster(ctx context.Context, clusterID uuid.UUID) (*DiscoveryResult, error) {
    // 1. Get cluster config from database
    // 2. Create Proxmox API client
    // 3. List all nodes
    // 4. For each node: list LXCs
    // 5. Store in database
    // 6. Return summary
}

// InstallAgentInLXC installs RedFlag agent in an LXC container
func (s *ProxmoxDiscoveryService) InstallAgentInLXC(ctx context.Context, lxcID uuid.UUID) error {
    // 1. Get LXC details from database
    // 2. Generate install script with pre-registration
    // 3. Upload script to LXC
    // 4. Execute script
    // 5. Wait for agent to register
    // 6. Update database
}

// SyncClusterStatus syncs real-time status from Proxmox API
func (s *ProxmoxDiscoveryService) SyncClusterStatus(ctx context.Context, clusterID uuid.UUID) error {
    // Background job: runs every 5 minutes
    // Updates node/LXC status, IP addresses, etc.
}

Phase 3: API Endpoints (Session 9c - 2 hours)

File: aggregator-server/internal/api/handlers/proxmox.go

// POST /api/v1/proxmox/clusters
// Add a new Proxmox cluster

// GET /api/v1/proxmox/clusters
// List all Proxmox clusters

// GET /api/v1/proxmox/clusters/:id
// Get cluster details with hierarchy

// POST /api/v1/proxmox/clusters/:id/discover
// Trigger discovery of nodes and LXCs

// POST /api/v1/proxmox/lxcs/:id/install-agent
// Install agent in specific LXC

// POST /api/v1/proxmox/clusters/:id/bulk-install
// Install agents in all LXCs in cluster

// GET /api/v1/proxmox/clusters/:id/hierarchy
// Get hierarchical tree view (cluster → nodes → LXCs → Docker)

// POST /api/v1/proxmox/clusters/:id/bulk-scan
// Trigger scan on all agents in cluster

// POST /api/v1/proxmox/nodes/:id/bulk-update
// Approve all updates for all LXCs on a node

Phase 4: Dashboard Integration (Session 9d - 4 hours)

Component: aggregator-web/src/pages/Proxmox.tsx

// Proxmox Integration page with:
// - List of clusters
// - Add cluster dialog
// - Hierarchical tree view
// - Bulk operation buttons
// - Status indicators
// - Discovery logs

🔐 Security Considerations

API Token Storage

  • Store token secrets encrypted in database
  • Use server-side encryption key (from environment)
  • Never expose tokens in API responses
  • Rotate tokens regularly

LXC Access

  • Only use API tokens with minimal permissions
  • Don't store root passwords
  • Use Proxmox's built-in permission system
  • Log all remote command executions

Agent Installation

  • Verify LXC is running before installation
  • Use HTTPS for agent download
  • Validate agent binary checksum
  • Don't leave install scripts on LXC after installation

🧪 Testing Plan

Manual Testing

  1. Set up test Proxmox VE instance
  2. Create 3-4 LXC containers
  3. Test cluster discovery
  4. Test agent installation
  5. Test hierarchical view
  6. Test bulk operations

Edge Cases

  • LXC is stopped during installation
  • Network interruption during discovery
  • Invalid API credentials
  • LXC without internet access
  • Multiple Proxmox clusters with same LXC names
  • Agent already installed (re-installation scenario)

📚 Proxmox API Documentation

Official Docs: https://pve.proxmox.com/wiki/Proxmox_VE_API

Key Endpoints:

GET  /api2/json/version                           # Version info
GET  /api2/json/nodes                             # List nodes
GET  /api2/json/nodes/{node}/lxc                  # List LXCs
GET  /api2/json/nodes/{node}/lxc/{vmid}/status   # LXC status
POST /api2/json/nodes/{node}/lxc/{vmid}/exec     # Execute command
GET  /api2/json/nodes/{node}/tasks/{upid}/status # Task status

Authentication:

# Create API token in Proxmox:
# Datacenter → Permissions → API Tokens → Add

# Use in requests:
Authorization: PVEAPIToken=root@pam!redflag=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

🎯 Success Criteria

User Can:

  1. Add Proxmox cluster in <2 minutes
  2. Auto-discover all LXCs in <1 minute
  3. Install agents in all LXCs in <5 minutes
  4. See hierarchical dashboard view
  5. Perform bulk scan across entire cluster
  6. Approve updates by node/cluster
  7. View update history per LXC
  8. Track which Docker containers run in which LXCs

Technical Metrics:

  • API response time < 500ms
  • Discovery time < 10s per node
  • Agent installation success rate > 95%
  • Real-time status updates within 30s
  • Support for 10+ clusters, 100+ LXCs

🚀 Future Enhancements

Phase 2 Features (Post-MVP):

  • VM Support: Extend beyond LXCs to full VMs
  • Automated Scheduling: "Update all LXCs on Node 1 every Sunday at 3am"
  • Snapshot Integration: Take LXC snapshot before updates
  • Rollback Support: Restore LXC snapshot if update fails
  • Proxmox Host Updates: Manage Proxmox VE host OS updates
  • HA Cluster Awareness: Respect Proxmox HA groups
  • Resource Monitoring: Track CPU/RAM/disk usage per LXC
  • Cost Tracking: Calculate resource usage and "cost" per LXC

Advanced Features:

  • Template Management: Auto-discover LXC templates, track which template each LXC uses
  • Backup Integration: Coordinate with Proxmox Backup Server
  • Migration Awareness: Detect LXC migrations between nodes
  • Cluster Health: Monitor Proxmox cluster health
  • Alerting: Email/Slack notifications for LXC issues

📊 Estimated Impact

For Users with Proxmox:

  • Time Saved: 90% reduction in update management time
    • Before: 20 minutes per day checking updates
    • After: 2 minutes per day reviewing dashboard
  • Visibility: 100% visibility across entire infrastructure
  • Control: Centralized control, no more SSH marathon
  • Automation: One-click bulk operations

For RedFlag Project:

  • Differentiation: MAJOR competitive advantage
  • Target Market: Directly addresses homelab use case
  • Adoption: Proxmox users will love this
  • Word of Mouth: "You HAVE to try RedFlag if you use Proxmox"

Priority: This is THE killer feature for the homelab market. Combined with Docker-first design and local CLI, RedFlag becomes the obvious choice for Proxmox homelabbers.


Last Updated: 2025-10-13 (Post-Session 3) Target Implementation: Session 9