# RedFlag Heartbeat System Documentation **Version**: v0.1.14 (Architecture Separation) ✅ **COMPLETED** **Status**: Fully functional with automatic UI updates **Last Updated**: 2025-10-28 ## Overview The RedFlag Heartbeat System enables agents to switch from normal polling (5-minute intervals) to rapid polling (10-second intervals) for real-time monitoring and operations. This system is essential for live operations, updates, and time-sensitive tasks where immediate agent responsiveness is required. The heartbeat system is a **temporary, on-demand rapid polling mechanism** that allows agents to check in every 10 seconds instead of the normal 5-minute intervals during active operations. This provides near real-time feedback for commands and operations. ## Architecture (v0.1.14+) ### Separation of Concerns **Core Design Principle**: Heartbeat is fast-changing data, general agent metadata is slow-changing. They should be treated separately with appropriate caching strategies. ### Data Flow ``` User clicks heartbeat button ↓ Heartbeat command created in database ↓ Agent processes command ↓ Agent sends immediate check-in with heartbeat metadata ↓ Server processes heartbeat metadata → Updates database ↓ UI gets heartbeat data via dedicated endpoint (5s cache) ↓ Buttons update automatically ``` ### New Architecture Components #### 1. Server-side Endpoints **GET `/api/v1/agents/{id}/heartbeat`** (NEW - v0.1.14) ```json { "enabled": boolean, // Heartbeat enabled by user "until": "timestamp", // When heartbeat expires "active": boolean, // Currently active (not expired) "duration_minutes": number // Configured duration } ``` **POST `/api/v1/agents/{id}/heartbeat`** (Existing) ```json { "enabled": true, "duration_minutes": 10 } ``` #### 2. Client-side Architecture **`useHeartbeatStatus(agentId)` Hook (NEW - v0.1.14)** - **Smart Polling**: Only polls when heartbeat is active - **5-second cache**: Appropriate for real-time data - **Auto-stops**: Stops polling when heartbeat expires - **No rate limiting**: Minimal server impact **Data Sources**: - **Heartbeat UI**: Uses dedicated endpoint (`/agents/{id}/heartbeat`) - **General Agent UI**: Uses existing endpoint (`/agents/{id}`) - **System Information**: Uses existing endpoint with 2-5 minute cache - **History**: Uses existing endpoint with 5-minute cache ### Smart Polling Logic ```typescript refetchInterval: (query) => { const data = query.state.data as HeartbeatStatus; // Only poll when heartbeat is enabled and still active if (data?.enabled && data?.active) { return 5000; // 5 seconds } return false; // No polling when inactive } ``` ## Legacy Systems Removed (v0.1.14) ### ❌ Removed Components 1. **Circular Sync Logic** (agent/main.go lines 353-365) - Problem: Config ↔ Client bidirectional sync causing inconsistent state - Removed in v0.1.13 2. **Startup Config→Client Sync** (agent/main.go lines 289-291) - Problem: Unnecessary sync that could override heartbeat state - Removed in v0.1.13 3. **Server-driven Heartbeat** (`EnableRapidPollingMode()`) - Problem: Bypassed command system, created inconsistency - Replaced with command-based approach in v0.1.13 4. **Mixed Data Sources** (v0.1.14) - Problem: Heartbeat state mixed with general agent metadata - Separated into dedicated endpoint in v0.1.14 ### ✅ Retained Components 1. **Command-based Architecture** (v0.1.12+) - Heartbeat commands go through same system as other commands - Full audit trail in history - Proper error handling and retry logic 2. **Config Persistence** (v0.1.13+) - `cfg.Save()` calls ensure heartbeat settings survive restarts - Agent remembers heartbeat state across reboots 3. **Stale Heartbeat Detection** (v0.1.13+) - Server detects when agent restarts without heartbeat - Creates audit command: "Heartbeat cleared - agent restarted without active heartbeat mode" ## Cache Strategy | Data Type | Endpoint | Cache Time | Polling Interval | Rationale | |------------|----------|------------|------------------|-----------| | **Heartbeat Status** | `/agents/{id}/heartbeat` | 5 seconds | 5 seconds (when active) | Real-time feedback needed | | **Agent Status** | `/agents/{id}` | 2-5 minutes | None | Slow-changing data | | **System Information** | `/agents/{id}` | 2-5 minutes | None | Static most of time | | **History Data** | `/agents/{id}/commands` | 5 minutes | None | Historical data | | **Active Commands** | `/commands/active` | 0 | 5 seconds | Command tracking | ## Usage Patterns ### 1. Manual Heartbeat Activation User clicks "Enable Heartbeat" → 10-minute default → Agent polls every 5 seconds → Auto-disable after 10 minutes ### 2. Duration Selection Quick Actions dropdown: 10min, 30min, 1hr, Permanent → Configured duration applies → Auto-disable when expires ### 3. Command-triggered Heartbeat Update/Install commands → Heartbeat enabled automatically (10min) → Command completes → Auto-disable after 10min ### 4. Stale State Detection Agent restarts with heartbeat active → Server detects mismatch → Creates audit command → Clears stale state ## Performance Impact ### Minimal Server Load - **Smart Polling**: Only polls when heartbeat is active - **Dedicated Endpoint**: Small JSON response (heartbeat data only) - **5-second Cache**: Prevents excessive API calls - **Auto-stop**: Polling stops when heartbeat expires ### Network Efficiency - **Separate Caches**: Fast data updates without affecting slow data - **No Global Refresh**: Only heartbeat components update frequently - **Conditional Polling**: No polling when heartbeat is inactive ## Debugging and Monitoring ### Server Logs ```bash [Heartbeat] Agent heartbeat status: enabled=, until=, active= [Heartbeat] Stale heartbeat detected for agent - server expected active until , but agent not reporting heartbeat (likely restarted) [Heartbeat] Cleared stale heartbeat state for agent [Heartbeat] Created audit trail for stale heartbeat cleanup (agent ) ``` ### Client Console Logs ```bash [Heartbeat UI] Tracking command for completion [Heartbeat UI] Command completed with status: [Heartbeat UI] Monitoring for completion of command ``` ### Common Issues 1. **Buttons Not Updating**: Check if using dedicated `useHeartbeatStatus()` hook 2. **Constant Polling**: Verify `active` property in heartbeat response 3. **Stale State**: Look for "stale heartbeat detected" logs 4. **Missing Data**: Ensure `/agents/{id}/heartbeat` endpoint is registered ## Migration Notes ### From v0.1.13 to v0.1.14 - ✅ **No Breaking Changes**: Existing endpoints preserved - ✅ **Improved UX**: Real-time heartbeat button updates - ✅ **Better Performance**: Smart polling reduces server load - ✅ **Clean Architecture**: Separated fast/slow data concerns ### Data Compatibility - Existing agent metadata format preserved - New heartbeat endpoint extracts from existing metadata - Backward compatibility maintained for legacy clients ## Future Enhancements ### Potential Improvements 1. **WebSocket Support**: Push updates instead of polling (v0.1.15+) 2. **Batch Heartbeat**: Multiple agents in single operation 3. **Global Heartbeat**: Enable/disable for all agents 4. **Scheduled Heartbeat**: Time-based activation 5. **Performance Metrics**: Track heartbeat efficiency ### Deprecation Timeline - **v0.1.13**: Command-based heartbeat (current) - **v0.1.14**: Architecture separation (current) - **v0.1.15**: WebSocket consideration - **v0.1.16**: Legacy metadata deprecation consideration ## Testing ### Functional Tests 1. **Manual Activation**: Click enable/disable buttons 2. **Duration Selection**: Test 10min/30min/1hr/permanent 3. **Auto-expiration**: Verify heartbeat stops when time expires 4. **Command Integration**: Confirm heartbeat auto-enables before updates 5. **Stale Detection**: Test agent restart scenarios ### Performance Tests 1. **Polling Behavior**: Verify smart polling (only when active) 2. **Cache Efficiency**: Confirm 5-second cache prevents excessive calls 3. **Multiple Agents**: Test concurrent heartbeat sessions 4. **Server Load**: Monitor during heavy heartbeat usage --- **Related Files**: - `aggregator-server/internal/api/handlers/agents.go`: New `GetHeartbeatStatus()` function - `aggregator-web/src/hooks/useHeartbeat.ts`: Smart polling hook - `aggregator-web/src/pages/Agents.tsx`: Updated UI components - `aggregator-web/src/lib/api.ts`: New `getHeartbeatStatus()` function