Add docs and project files - force for Culurien

This commit is contained in:
Fimeg
2026-03-28 20:46:24 -04:00
parent dc61797423
commit 484a7f77ce
343 changed files with 119530 additions and 0 deletions

View File

@@ -0,0 +1,233 @@
# RedFlag Heartbeat System Documentation
**Version**: v0.1.14 (Architecture Separation) ✅ **COMPLETED**
**Status**: Fully functional with automatic UI updates
**Last Updated**: 2025-10-28
## Overview
The RedFlag Heartbeat System enables agents to switch from normal polling (5-minute intervals) to rapid polling (10-second intervals) for real-time monitoring and operations. This system is essential for live operations, updates, and time-sensitive tasks where immediate agent responsiveness is required.
The heartbeat system is a **temporary, on-demand rapid polling mechanism** that allows agents to check in every 10 seconds instead of the normal 5-minute intervals during active operations. This provides near real-time feedback for commands and operations.
## Architecture (v0.1.14+)
### Separation of Concerns
**Core Design Principle**: Heartbeat is fast-changing data, general agent metadata is slow-changing. They should be treated separately with appropriate caching strategies.
### Data Flow
```
User clicks heartbeat button
Heartbeat command created in database
Agent processes command
Agent sends immediate check-in with heartbeat metadata
Server processes heartbeat metadata → Updates database
UI gets heartbeat data via dedicated endpoint (5s cache)
Buttons update automatically
```
### New Architecture Components
#### 1. Server-side Endpoints
**GET `/api/v1/agents/{id}/heartbeat`** (NEW - v0.1.14)
```json
{
"enabled": boolean, // Heartbeat enabled by user
"until": "timestamp", // When heartbeat expires
"active": boolean, // Currently active (not expired)
"duration_minutes": number // Configured duration
}
```
**POST `/api/v1/agents/{id}/heartbeat`** (Existing)
```json
{
"enabled": true,
"duration_minutes": 10
}
```
#### 2. Client-side Architecture
**`useHeartbeatStatus(agentId)` Hook (NEW - v0.1.14)**
- **Smart Polling**: Only polls when heartbeat is active
- **5-second cache**: Appropriate for real-time data
- **Auto-stops**: Stops polling when heartbeat expires
- **No rate limiting**: Minimal server impact
**Data Sources**:
- **Heartbeat UI**: Uses dedicated endpoint (`/agents/{id}/heartbeat`)
- **General Agent UI**: Uses existing endpoint (`/agents/{id}`)
- **System Information**: Uses existing endpoint with 2-5 minute cache
- **History**: Uses existing endpoint with 5-minute cache
### Smart Polling Logic
```typescript
refetchInterval: (query) => {
const data = query.state.data as HeartbeatStatus;
// Only poll when heartbeat is enabled and still active
if (data?.enabled && data?.active) {
return 5000; // 5 seconds
}
return false; // No polling when inactive
}
```
## Legacy Systems Removed (v0.1.14)
### ❌ Removed Components
1. **Circular Sync Logic** (agent/main.go lines 353-365)
- Problem: Config ↔ Client bidirectional sync causing inconsistent state
- Removed in v0.1.13
2. **Startup Config→Client Sync** (agent/main.go lines 289-291)
- Problem: Unnecessary sync that could override heartbeat state
- Removed in v0.1.13
3. **Server-driven Heartbeat** (`EnableRapidPollingMode()`)
- Problem: Bypassed command system, created inconsistency
- Replaced with command-based approach in v0.1.13
4. **Mixed Data Sources** (v0.1.14)
- Problem: Heartbeat state mixed with general agent metadata
- Separated into dedicated endpoint in v0.1.14
### ✅ Retained Components
1. **Command-based Architecture** (v0.1.12+)
- Heartbeat commands go through same system as other commands
- Full audit trail in history
- Proper error handling and retry logic
2. **Config Persistence** (v0.1.13+)
- `cfg.Save()` calls ensure heartbeat settings survive restarts
- Agent remembers heartbeat state across reboots
3. **Stale Heartbeat Detection** (v0.1.13+)
- Server detects when agent restarts without heartbeat
- Creates audit command: "Heartbeat cleared - agent restarted without active heartbeat mode"
## Cache Strategy
| Data Type | Endpoint | Cache Time | Polling Interval | Rationale |
|------------|----------|------------|------------------|-----------|
| **Heartbeat Status** | `/agents/{id}/heartbeat` | 5 seconds | 5 seconds (when active) | Real-time feedback needed |
| **Agent Status** | `/agents/{id}` | 2-5 minutes | None | Slow-changing data |
| **System Information** | `/agents/{id}` | 2-5 minutes | None | Static most of time |
| **History Data** | `/agents/{id}/commands` | 5 minutes | None | Historical data |
| **Active Commands** | `/commands/active` | 0 | 5 seconds | Command tracking |
## Usage Patterns
### 1. Manual Heartbeat Activation
User clicks "Enable Heartbeat" → 10-minute default → Agent polls every 5 seconds → Auto-disable after 10 minutes
### 2. Duration Selection
Quick Actions dropdown: 10min, 30min, 1hr, Permanent → Configured duration applies → Auto-disable when expires
### 3. Command-triggered Heartbeat
Update/Install commands → Heartbeat enabled automatically (10min) → Command completes → Auto-disable after 10min
### 4. Stale State Detection
Agent restarts with heartbeat active → Server detects mismatch → Creates audit command → Clears stale state
## Performance Impact
### Minimal Server Load
- **Smart Polling**: Only polls when heartbeat is active
- **Dedicated Endpoint**: Small JSON response (heartbeat data only)
- **5-second Cache**: Prevents excessive API calls
- **Auto-stop**: Polling stops when heartbeat expires
### Network Efficiency
- **Separate Caches**: Fast data updates without affecting slow data
- **No Global Refresh**: Only heartbeat components update frequently
- **Conditional Polling**: No polling when heartbeat is inactive
## Debugging and Monitoring
### Server Logs
```bash
[Heartbeat] Agent <id> heartbeat status: enabled=<bool>, until=<timestamp>, active=<bool>
[Heartbeat] Stale heartbeat detected for agent <id> - server expected active until <timestamp>, but agent not reporting heartbeat (likely restarted)
[Heartbeat] Cleared stale heartbeat state for agent <id>
[Heartbeat] Created audit trail for stale heartbeat cleanup (agent <id>)
```
### Client Console Logs
```bash
[Heartbeat UI] Tracking command <command-id> for completion
[Heartbeat UI] Command <command-id> completed with status: <status>
[Heartbeat UI] Monitoring for completion of command <command-id>
```
### Common Issues
1. **Buttons Not Updating**: Check if using dedicated `useHeartbeatStatus()` hook
2. **Constant Polling**: Verify `active` property in heartbeat response
3. **Stale State**: Look for "stale heartbeat detected" logs
4. **Missing Data**: Ensure `/agents/{id}/heartbeat` endpoint is registered
## Migration Notes
### From v0.1.13 to v0.1.14
-**No Breaking Changes**: Existing endpoints preserved
-**Improved UX**: Real-time heartbeat button updates
-**Better Performance**: Smart polling reduces server load
-**Clean Architecture**: Separated fast/slow data concerns
### Data Compatibility
- Existing agent metadata format preserved
- New heartbeat endpoint extracts from existing metadata
- Backward compatibility maintained for legacy clients
## Future Enhancements
### Potential Improvements
1. **WebSocket Support**: Push updates instead of polling (v0.1.15+)
2. **Batch Heartbeat**: Multiple agents in single operation
3. **Global Heartbeat**: Enable/disable for all agents
4. **Scheduled Heartbeat**: Time-based activation
5. **Performance Metrics**: Track heartbeat efficiency
### Deprecation Timeline
- **v0.1.13**: Command-based heartbeat (current)
- **v0.1.14**: Architecture separation (current)
- **v0.1.15**: WebSocket consideration
- **v0.1.16**: Legacy metadata deprecation consideration
## Testing
### Functional Tests
1. **Manual Activation**: Click enable/disable buttons
2. **Duration Selection**: Test 10min/30min/1hr/permanent
3. **Auto-expiration**: Verify heartbeat stops when time expires
4. **Command Integration**: Confirm heartbeat auto-enables before updates
5. **Stale Detection**: Test agent restart scenarios
### Performance Tests
1. **Polling Behavior**: Verify smart polling (only when active)
2. **Cache Efficiency**: Confirm 5-second cache prevents excessive calls
3. **Multiple Agents**: Test concurrent heartbeat sessions
4. **Server Load**: Monitor during heavy heartbeat usage
---
**Related Files**:
- `aggregator-server/internal/api/handlers/agents.go`: New `GetHeartbeatStatus()` function
- `aggregator-web/src/hooks/useHeartbeat.ts`: Smart polling hook
- `aggregator-web/src/pages/Agents.tsx`: Updated UI components
- `aggregator-web/src/lib/api.ts`: New `getHeartbeatStatus()` function