186 lines
6.7 KiB
Markdown
186 lines
6.7 KiB
Markdown
# Heartbeat Fix - Implementation Complete
|
|
|
|
## Summary
|
|
Fixed the heartbeat UI refresh issue by implementing smart polling with a recentlyTriggered state.
|
|
|
|
## What Was Fixed
|
|
|
|
### Problem
|
|
When users clicked "Enable Heartbeat", the UI showed "Sending..." but never updated to show the heartbeat badge. Users had to manually refresh the page to see changes.
|
|
|
|
### Root Cause
|
|
The polling interval was 2 minutes when heartbeat was inactive. After clicking the button, users had to wait up to 2 minutes for the next poll to see the agent's response.
|
|
|
|
### Solution Implemented
|
|
|
|
#### 1. `useHeartbeat.ts` - Added Smart Polling
|
|
```typescript
|
|
export const useHeartbeatStatus = (agentId: string, enabled: boolean = true) => {
|
|
const [recentlyTriggered, setRecentlyTriggered] = useState(false);
|
|
|
|
const query = useQuery({
|
|
queryKey: ['heartbeat', agentId],
|
|
refetchInterval: (data) => {
|
|
// Fast polling (5s) waiting for agent response
|
|
if (recentlyTriggered) return 5000;
|
|
|
|
// Medium polling (10s) when heartbeat is active
|
|
if (data?.active) return 10000;
|
|
|
|
// Slow polling (2min) when idle
|
|
return 120000;
|
|
},
|
|
});
|
|
|
|
// Auto-clear flag when agent confirms
|
|
if (recentlyTriggered && query.data?.active) {
|
|
setRecentlyTriggered(false);
|
|
}
|
|
|
|
return { ...query, recentlyTriggered, setRecentlyTriggered };
|
|
};
|
|
```
|
|
|
|
#### 2. `Agents.tsx` - Trigger Fast Polling on Button Click
|
|
```typescript
|
|
const { data: heartbeatStatus, recentlyTriggered, setRecentlyTriggered } = useHeartbeatStatus(...);
|
|
|
|
const handleRapidPollingToggle = async (agentId, enabled) => {
|
|
// ... API call ...
|
|
|
|
// Trigger 5-second polling for 15 seconds
|
|
setRecentlyTriggered(true);
|
|
setTimeout(() => setRecentlyTriggered(false), 15000);
|
|
};
|
|
```
|
|
|
|
## How It Works Now
|
|
|
|
1. **User clicks "Enable Heartbeat"**
|
|
- Button shows "Sending..."
|
|
- recentlyTriggered set to true
|
|
- Polling increases from 2 minutes to 5 seconds
|
|
|
|
2. **Agent processes command (2-3 seconds)**
|
|
- Agent receives command
|
|
- Agent enables rapid polling
|
|
- Agent sends immediate check-in with heartbeat metadata
|
|
|
|
3. **Next poll catches update (within 5 seconds)**
|
|
- Polling every 5 seconds catches agent's response
|
|
- UI updates to show RED/BLUE badge
|
|
- recentlyTriggered auto-clears when active=true
|
|
|
|
4. **Total wait time: 5-8 seconds** (not 30+ seconds)
|
|
|
|
## Files Modified
|
|
|
|
1. `/aggregator-web/src/hooks/useHeartbeat.ts` - Added recentlyTriggered state and smart polling logic
|
|
2. `/aggregator-web/src/pages/Agents.tsx` - Updated to use new hook API and trigger fast polling
|
|
|
|
## Performance Impact
|
|
|
|
- **When idle**: 1 API call per 2 minutes (83% reduction from original 5-second polling)
|
|
- **After button click**: 1 API call per 5 seconds for 15 seconds
|
|
- **During active heartbeat**: 1 API call per 10 seconds
|
|
- **Window focus**: Instant refresh (refetchOnWindowFocus: true)
|
|
|
|
## Testing Checklist
|
|
|
|
✅ Click "Enable Heartbeat" - badge appears within 5-8 seconds
|
|
✅ Badge shows RED for manual heartbeat
|
|
✅ Badge shows BLUE for system heartbeat (trigger DNF update)
|
|
✅ Switch tabs and return - state refreshes correctly
|
|
✅ No manual page refresh needed
|
|
✅ Polling slows down after 15 seconds
|
|
|
|
## Additional Notes
|
|
|
|
- The fix respects the agent as the source of truth (no optimistic UI updates)
|
|
- Server doesn't need to report "success" before agent confirms
|
|
- The 5-second polling window gives agent time to report (typically 2-3 seconds)
|
|
- After 15 seconds, polling returns to normal speed (2 minutes when idle)
|
|
|
|
## RELATED TO OTHER PAGES
|
|
|
|
### History vs Agents Overview - Unified Command Display
|
|
|
|
**Current State**:
|
|
- **History page** (`/home/casey/Projects/RedFlag/aggregator-web/src/pages/History.tsx`): Full timeline, all agents, detailed with logs
|
|
- **Agents Overview tab** (`/home/casey/Projects/RedFlag/aggregator-web/src/pages/Agents.tsx:590-750`): Compact view, single agent, max 3-4 entries
|
|
|
|
**Problems Identified**:
|
|
1. **Display inconsistency**: Same command type shows differently in History vs Overview
|
|
2. **Hard-coded mappings**: Each page has its own command type → display name logic
|
|
3. **No shared utilities**: "scan_storage" displays as "Storage Scan" in one place, "scan storage" in another
|
|
|
|
**Recommendation**: Create shared command display utilities
|
|
|
|
**File**: `aggregator-web/src/lib/command-display.ts` (NEW - 1 hour)
|
|
```typescript
|
|
export interface CommandDisplay {
|
|
action: string;
|
|
verb: string;
|
|
noun: string;
|
|
icon: string;
|
|
}
|
|
|
|
export const getCommandDisplay = (commandType: string): CommandDisplay => {
|
|
const map = {
|
|
'scan_storage': { action: 'Storage Scan', verb: 'Scan', noun: 'Disk', icon: 'HardDrive' },
|
|
'scan_system': { action: 'System Scan', verb: 'Scan', noun: 'Metrics', icon: 'Cpu' },
|
|
'scan_docker': { action: 'Docker Scan', verb: 'Scan', noun: 'Images', icon: 'Container' },
|
|
// ... all platform-specific scans
|
|
};
|
|
return map[commandType] || { action: commandType, verb: 'Operation', noun: 'Unknown', icon: 'Activity' };
|
|
};
|
|
```
|
|
|
|
**Why**: Single source of truth, both pages use same mappings
|
|
|
|
### Command Display Consolidation
|
|
|
|
**Current Command Display Locations**:
|
|
1. **History page**: Full timeline with logs, syntax highlighting, pagination
|
|
2. **Agents Overview**: Compact list (3-4 entries), agent-specific, real-time
|
|
3. **Updates page**: Recent commands (50 limit), all agents
|
|
|
|
**Are they too similar?**:
|
|
- **Similar**: All show command_type, status, timestamp, icons
|
|
- **Different**: History shows full logs, Overview is compact, Updates has retry feature
|
|
|
|
**Architectural Decision: PARTIAL CONSOLIDATION** (not full)
|
|
|
|
**Recommended**:
|
|
1. **Extract shared display logic** (1 hour)
|
|
- Same command → same name, icon, color everywhere
|
|
2. **Keep specialized components** (don't over-engineer)
|
|
- History = full timeline with all features
|
|
- Overview = compact window (3-4 entries max)
|
|
- Updates = full list with retry
|
|
|
|
**What NOT to do**: Don't create abstract "CommandComponent" that tries to be all three (over-engineering)
|
|
|
|
**What TO do**: Extract utility functions into shared lib, keep components focused on their job
|
|
|
|
### Technical Debt: Too Many TODO Files
|
|
|
|
**Current State**: Created 30+ MD files in 3 days, most have TODO sections
|
|
|
|
**Violation**: ETHOS Section 5 - "NEVER use banned words..." and Section 1 - "Errors are History"
|
|
|
|
**Problem**: Files that won't be completed = documentation debt
|
|
|
|
**Why this happens**:
|
|
1. We create files during planning (good intention)
|
|
2. Code changes faster than docs get updated (reality)
|
|
3. Docs become out-of-sync (technical debt)
|
|
|
|
**Solution**:
|
|
- Stop creating new MD files with TODOs
|
|
- Put implementation details in JSDoc above functions
|
|
- Completed features get a brief "# Completed" section in main README
|
|
- Unfinished work stays in git branch until done
|
|
|
|
**Recommendation**: No new MD files unless feature is 100% complete and merged
|