8.5 KiB
RedFlag Heartbeat System Documentation
Version: v0.1.14 (Architecture Separation) ✅ COMPLETED Status: Fully functional with automatic UI updates Last Updated: 2025-10-28
Overview
The RedFlag Heartbeat System enables agents to switch from normal polling (5-minute intervals) to rapid polling (10-second intervals) for real-time monitoring and operations. This system is essential for live operations, updates, and time-sensitive tasks where immediate agent responsiveness is required.
The heartbeat system is a temporary, on-demand rapid polling mechanism that allows agents to check in every 10 seconds instead of the normal 5-minute intervals during active operations. This provides near real-time feedback for commands and operations.
Architecture (v0.1.14+)
Separation of Concerns
Core Design Principle: Heartbeat is fast-changing data, general agent metadata is slow-changing. They should be treated separately with appropriate caching strategies.
Data Flow
User clicks heartbeat button
↓
Heartbeat command created in database
↓
Agent processes command
↓
Agent sends immediate check-in with heartbeat metadata
↓
Server processes heartbeat metadata → Updates database
↓
UI gets heartbeat data via dedicated endpoint (5s cache)
↓
Buttons update automatically
New Architecture Components
1. Server-side Endpoints
GET /api/v1/agents/{id}/heartbeat (NEW - v0.1.14)
{
"enabled": boolean, // Heartbeat enabled by user
"until": "timestamp", // When heartbeat expires
"active": boolean, // Currently active (not expired)
"duration_minutes": number // Configured duration
}
POST /api/v1/agents/{id}/heartbeat (Existing)
{
"enabled": true,
"duration_minutes": 10
}
2. Client-side Architecture
useHeartbeatStatus(agentId) Hook (NEW - v0.1.14)
- Smart Polling: Only polls when heartbeat is active
- 5-second cache: Appropriate for real-time data
- Auto-stops: Stops polling when heartbeat expires
- No rate limiting: Minimal server impact
Data Sources:
- Heartbeat UI: Uses dedicated endpoint (
/agents/{id}/heartbeat) - General Agent UI: Uses existing endpoint (
/agents/{id}) - System Information: Uses existing endpoint with 2-5 minute cache
- History: Uses existing endpoint with 5-minute cache
Smart Polling Logic
refetchInterval: (query) => {
const data = query.state.data as HeartbeatStatus;
// Only poll when heartbeat is enabled and still active
if (data?.enabled && data?.active) {
return 5000; // 5 seconds
}
return false; // No polling when inactive
}
Legacy Systems Removed (v0.1.14)
❌ Removed Components
-
Circular Sync Logic (agent/main.go lines 353-365)
- Problem: Config ↔ Client bidirectional sync causing inconsistent state
- Removed in v0.1.13
-
Startup Config→Client Sync (agent/main.go lines 289-291)
- Problem: Unnecessary sync that could override heartbeat state
- Removed in v0.1.13
-
Server-driven Heartbeat (
EnableRapidPollingMode())- Problem: Bypassed command system, created inconsistency
- Replaced with command-based approach in v0.1.13
-
Mixed Data Sources (v0.1.14)
- Problem: Heartbeat state mixed with general agent metadata
- Separated into dedicated endpoint in v0.1.14
✅ Retained Components
-
Command-based Architecture (v0.1.12+)
- Heartbeat commands go through same system as other commands
- Full audit trail in history
- Proper error handling and retry logic
-
Config Persistence (v0.1.13+)
cfg.Save()calls ensure heartbeat settings survive restarts- Agent remembers heartbeat state across reboots
-
Stale Heartbeat Detection (v0.1.13+)
- Server detects when agent restarts without heartbeat
- Creates audit command: "Heartbeat cleared - agent restarted without active heartbeat mode"
Cache Strategy
| Data Type | Endpoint | Cache Time | Polling Interval | Rationale |
|---|---|---|---|---|
| Heartbeat Status | /agents/{id}/heartbeat |
5 seconds | 5 seconds (when active) | Real-time feedback needed |
| Agent Status | /agents/{id} |
2-5 minutes | None | Slow-changing data |
| System Information | /agents/{id} |
2-5 minutes | None | Static most of time |
| History Data | /agents/{id}/commands |
5 minutes | None | Historical data |
| Active Commands | /commands/active |
0 | 5 seconds | Command tracking |
Usage Patterns
1. Manual Heartbeat Activation
User clicks "Enable Heartbeat" → 10-minute default → Agent polls every 5 seconds → Auto-disable after 10 minutes
2. Duration Selection
Quick Actions dropdown: 10min, 30min, 1hr, Permanent → Configured duration applies → Auto-disable when expires
3. Command-triggered Heartbeat
Update/Install commands → Heartbeat enabled automatically (10min) → Command completes → Auto-disable after 10min
4. Stale State Detection
Agent restarts with heartbeat active → Server detects mismatch → Creates audit command → Clears stale state
Performance Impact
Minimal Server Load
- Smart Polling: Only polls when heartbeat is active
- Dedicated Endpoint: Small JSON response (heartbeat data only)
- 5-second Cache: Prevents excessive API calls
- Auto-stop: Polling stops when heartbeat expires
Network Efficiency
- Separate Caches: Fast data updates without affecting slow data
- No Global Refresh: Only heartbeat components update frequently
- Conditional Polling: No polling when heartbeat is inactive
Debugging and Monitoring
Server Logs
[Heartbeat] Agent <id> heartbeat status: enabled=<bool>, until=<timestamp>, active=<bool>
[Heartbeat] Stale heartbeat detected for agent <id> - server expected active until <timestamp>, but agent not reporting heartbeat (likely restarted)
[Heartbeat] Cleared stale heartbeat state for agent <id>
[Heartbeat] Created audit trail for stale heartbeat cleanup (agent <id>)
Client Console Logs
[Heartbeat UI] Tracking command <command-id> for completion
[Heartbeat UI] Command <command-id> completed with status: <status>
[Heartbeat UI] Monitoring for completion of command <command-id>
Common Issues
- Buttons Not Updating: Check if using dedicated
useHeartbeatStatus()hook - Constant Polling: Verify
activeproperty in heartbeat response - Stale State: Look for "stale heartbeat detected" logs
- Missing Data: Ensure
/agents/{id}/heartbeatendpoint is registered
Migration Notes
From v0.1.13 to v0.1.14
- ✅ No Breaking Changes: Existing endpoints preserved
- ✅ Improved UX: Real-time heartbeat button updates
- ✅ Better Performance: Smart polling reduces server load
- ✅ Clean Architecture: Separated fast/slow data concerns
Data Compatibility
- Existing agent metadata format preserved
- New heartbeat endpoint extracts from existing metadata
- Backward compatibility maintained for legacy clients
Future Enhancements
Potential Improvements
- WebSocket Support: Push updates instead of polling (v0.1.15+)
- Batch Heartbeat: Multiple agents in single operation
- Global Heartbeat: Enable/disable for all agents
- Scheduled Heartbeat: Time-based activation
- Performance Metrics: Track heartbeat efficiency
Deprecation Timeline
- v0.1.13: Command-based heartbeat (current)
- v0.1.14: Architecture separation (current)
- v0.1.15: WebSocket consideration
- v0.1.16: Legacy metadata deprecation consideration
Testing
Functional Tests
- Manual Activation: Click enable/disable buttons
- Duration Selection: Test 10min/30min/1hr/permanent
- Auto-expiration: Verify heartbeat stops when time expires
- Command Integration: Confirm heartbeat auto-enables before updates
- Stale Detection: Test agent restart scenarios
Performance Tests
- Polling Behavior: Verify smart polling (only when active)
- Cache Efficiency: Confirm 5-second cache prevents excessive calls
- Multiple Agents: Test concurrent heartbeat sessions
- Server Load: Monitor during heavy heartbeat usage
Related Files:
aggregator-server/internal/api/handlers/agents.go: NewGetHeartbeatStatus()functionaggregator-web/src/hooks/useHeartbeat.ts: Smart polling hookaggregator-web/src/pages/Agents.tsx: Updated UI componentsaggregator-web/src/lib/api.ts: NewgetHeartbeatStatus()function