Files
Redflag/docs/historical/IMPLEMENTATION_COMPLETE.md

186 lines
6.7 KiB
Markdown

# Heartbeat Fix - Implementation Complete
## Summary
Fixed the heartbeat UI refresh issue by implementing smart polling with a recentlyTriggered state.
## What Was Fixed
### Problem
When users clicked "Enable Heartbeat", the UI showed "Sending..." but never updated to show the heartbeat badge. Users had to manually refresh the page to see changes.
### Root Cause
The polling interval was 2 minutes when heartbeat was inactive. After clicking the button, users had to wait up to 2 minutes for the next poll to see the agent's response.
### Solution Implemented
#### 1. `useHeartbeat.ts` - Added Smart Polling
```typescript
export const useHeartbeatStatus = (agentId: string, enabled: boolean = true) => {
const [recentlyTriggered, setRecentlyTriggered] = useState(false);
const query = useQuery({
queryKey: ['heartbeat', agentId],
refetchInterval: (data) => {
// Fast polling (5s) waiting for agent response
if (recentlyTriggered) return 5000;
// Medium polling (10s) when heartbeat is active
if (data?.active) return 10000;
// Slow polling (2min) when idle
return 120000;
},
});
// Auto-clear flag when agent confirms
if (recentlyTriggered && query.data?.active) {
setRecentlyTriggered(false);
}
return { ...query, recentlyTriggered, setRecentlyTriggered };
};
```
#### 2. `Agents.tsx` - Trigger Fast Polling on Button Click
```typescript
const { data: heartbeatStatus, recentlyTriggered, setRecentlyTriggered } = useHeartbeatStatus(...);
const handleRapidPollingToggle = async (agentId, enabled) => {
// ... API call ...
// Trigger 5-second polling for 15 seconds
setRecentlyTriggered(true);
setTimeout(() => setRecentlyTriggered(false), 15000);
};
```
## How It Works Now
1. **User clicks "Enable Heartbeat"**
- Button shows "Sending..."
- recentlyTriggered set to true
- Polling increases from 2 minutes to 5 seconds
2. **Agent processes command (2-3 seconds)**
- Agent receives command
- Agent enables rapid polling
- Agent sends immediate check-in with heartbeat metadata
3. **Next poll catches update (within 5 seconds)**
- Polling every 5 seconds catches agent's response
- UI updates to show RED/BLUE badge
- recentlyTriggered auto-clears when active=true
4. **Total wait time: 5-8 seconds** (not 30+ seconds)
## Files Modified
1. `/aggregator-web/src/hooks/useHeartbeat.ts` - Added recentlyTriggered state and smart polling logic
2. `/aggregator-web/src/pages/Agents.tsx` - Updated to use new hook API and trigger fast polling
## Performance Impact
- **When idle**: 1 API call per 2 minutes (83% reduction from original 5-second polling)
- **After button click**: 1 API call per 5 seconds for 15 seconds
- **During active heartbeat**: 1 API call per 10 seconds
- **Window focus**: Instant refresh (refetchOnWindowFocus: true)
## Testing Checklist
✅ Click "Enable Heartbeat" - badge appears within 5-8 seconds
✅ Badge shows RED for manual heartbeat
✅ Badge shows BLUE for system heartbeat (trigger DNF update)
✅ Switch tabs and return - state refreshes correctly
✅ No manual page refresh needed
✅ Polling slows down after 15 seconds
## Additional Notes
- The fix respects the agent as the source of truth (no optimistic UI updates)
- Server doesn't need to report "success" before agent confirms
- The 5-second polling window gives agent time to report (typically 2-3 seconds)
- After 15 seconds, polling returns to normal speed (2 minutes when idle)
## RELATED TO OTHER PAGES
### History vs Agents Overview - Unified Command Display
**Current State**:
- **History page** (`/home/casey/Projects/RedFlag/aggregator-web/src/pages/History.tsx`): Full timeline, all agents, detailed with logs
- **Agents Overview tab** (`/home/casey/Projects/RedFlag/aggregator-web/src/pages/Agents.tsx:590-750`): Compact view, single agent, max 3-4 entries
**Problems Identified**:
1. **Display inconsistency**: Same command type shows differently in History vs Overview
2. **Hard-coded mappings**: Each page has its own command type → display name logic
3. **No shared utilities**: "scan_storage" displays as "Storage Scan" in one place, "scan storage" in another
**Recommendation**: Create shared command display utilities
**File**: `aggregator-web/src/lib/command-display.ts` (NEW - 1 hour)
```typescript
export interface CommandDisplay {
action: string;
verb: string;
noun: string;
icon: string;
}
export const getCommandDisplay = (commandType: string): CommandDisplay => {
const map = {
'scan_storage': { action: 'Storage Scan', verb: 'Scan', noun: 'Disk', icon: 'HardDrive' },
'scan_system': { action: 'System Scan', verb: 'Scan', noun: 'Metrics', icon: 'Cpu' },
'scan_docker': { action: 'Docker Scan', verb: 'Scan', noun: 'Images', icon: 'Container' },
// ... all platform-specific scans
};
return map[commandType] || { action: commandType, verb: 'Operation', noun: 'Unknown', icon: 'Activity' };
};
```
**Why**: Single source of truth, both pages use same mappings
### Command Display Consolidation
**Current Command Display Locations**:
1. **History page**: Full timeline with logs, syntax highlighting, pagination
2. **Agents Overview**: Compact list (3-4 entries), agent-specific, real-time
3. **Updates page**: Recent commands (50 limit), all agents
**Are they too similar?**:
- **Similar**: All show command_type, status, timestamp, icons
- **Different**: History shows full logs, Overview is compact, Updates has retry feature
**Architectural Decision: PARTIAL CONSOLIDATION** (not full)
**Recommended**:
1. **Extract shared display logic** (1 hour)
- Same command → same name, icon, color everywhere
2. **Keep specialized components** (don't over-engineer)
- History = full timeline with all features
- Overview = compact window (3-4 entries max)
- Updates = full list with retry
**What NOT to do**: Don't create abstract "CommandComponent" that tries to be all three (over-engineering)
**What TO do**: Extract utility functions into shared lib, keep components focused on their job
### Technical Debt: Too Many TODO Files
**Current State**: Created 30+ MD files in 3 days, most have TODO sections
**Violation**: ETHOS Section 5 - "NEVER use banned words..." and Section 1 - "Errors are History"
**Problem**: Files that won't be completed = documentation debt
**Why this happens**:
1. We create files during planning (good intention)
2. Code changes faster than docs get updated (reality)
3. Docs become out-of-sync (technical debt)
**Solution**:
- Stop creating new MD files with TODOs
- Put implementation details in JSDoc above functions
- Completed features get a brief "# Completed" section in main README
- Unfinished work stays in git branch until done
**Recommendation**: No new MD files unless feature is 100% complete and merged