7.8 KiB
Session Summary & Resume Point
What We Just Completed
Branch: feature/host-restart-handling
Status: Pushed to remote, ready for testing
Implemented Features (Issues #4 and #6)
-
Issue #6 - Agent Version Display Fix
- Set
CurrentVersionduring registration instead of waiting for first check-in - Changed UI text from "Unknown" to "Initial Registration"
- Files:
aggregator-server/internal/api/handlers/agents.go,aggregator-web/src/pages/Agents.tsx
- Set
-
Issue #4 - Host Restart Detection & Handling
- Database migration
013_add_reboot_tracking.up.sqladds reboot fields - Agent detects pending reboots (Debian/Ubuntu, RHEL/Fedora, Windows)
- New reboot command with 1-minute grace period
- UI shows restart alerts and "Restart Host" button
- Files: Migration, models, queries, handlers, agent detection, frontend components
- Database migration
-
Critical Bug Fix
- Fixed
reboot_reasonfield causing database scan failures (wasstring, needed*stringfor NULL handling) - Commit:
5e9c27b
- Fixed
-
Documentation
- Added full reinstall section to README with agent re-registration steps
Current Issues Found During Testing
1. Rate Limit Bug - FIRST Request Gets Blocked
Symptom: Every first agent registration gets 429 Too Many Requests, then works after 1 minute wait.
Theory: Rate limiter keys aren't namespaced by limit type. All endpoints using KeyByIP share the same counter:
public_access(download, install script): 20/minagent_registration: 5/min- Both use just the IP as key, not namespaced
Problem Location: aggregator-server/internal/api/middleware/rate_limiter.go line ~133
key := keyFunc(c) // Just "127.0.0.1"
allowed, resetTime := rl.checkRateLimit(key, config)
Suspected Fix:
key := keyFunc(c)
namespacedKey := limitType + ":" + key // "agent_registration:127.0.0.1"
allowed, resetTime := rl.checkRateLimit(namespacedKey, config)
Test Script: docs/NeedsDoing/test-rate-limit.sh
- Run after fresh docker-compose up
- Tests if first request fails
- Tests if download/install/register share counters
- Sequential test to find actual limit
2. Session Loop Bug - Returned
Symptom: After setup completion and server restart, UI flashes/loops rapidly on dashboard/agents/settings. Must logout and login to fix.
Previous Fix: Commit 7b77641 added logout() call, cleared auth on 401
Current Problem: SetupCompletionChecker.tsx dependency array issue
wasInSetupModein dependency array causes multiple interval creation- Each state change creates new interval without cleaning up old ones
- During docker restart: multiple 3-second polls overlap = flashing
Problem Location: aggregator-web/src/components/SetupCompletionChecker.tsx lines 15-52
Suspected Fix: Remove wasInSetupMode from dependency array, use local variable instead
Next Session Plan
1. Test Rate Limiter (This Machine)
# Full clean rebuild
cd /home/memory/Desktop/Projects/RedFlag
docker-compose down -v --remove-orphans && \
rm config/.env && \
docker-compose build --no-cache && \
cp config/.env.bootstrap.example config/.env && \
docker-compose up -d
# Wait for ready
sleep 15
# Complete setup wizard manually
# Generate registration token
# Run test script
cd docs/NeedsDoing
REGISTRATION_TOKEN="your-token-here" ./test-rate-limit.sh
# Check results - confirm first request bug
# Check server logs
docker-compose logs server | grep -i "rate\|limit\|429"
2. Fix Rate Limiter
If tests confirm the theory:
File: aggregator-server/internal/api/middleware/rate_limiter.go
Find the RateLimit function (around line 120-165) and update:
// BEFORE (line ~133)
key := keyFunc(c)
if key == "" {
c.Next()
return
}
allowed, resetTime := rl.checkRateLimit(key, config)
// AFTER
key := keyFunc(c)
if key == "" {
c.Next()
return
}
// Namespace the key by limit type to prevent different endpoints from sharing counters
namespacedKey := limitType + ":" + key
allowed, resetTime := rl.checkRateLimit(namespacedKey, config)
Also update getRemainingRequests function similarly (around line 209).
Test: Re-run test-rate-limit.sh - first request should succeed
3. Fix Session Loop
File: aggregator-web/src/components/SetupCompletionChecker.tsx
Current (broken):
const [wasInSetupMode, setWasInSetupMode] = useState(false);
useEffect(() => {
const checkSetupStatus = async () => {
// uses wasInSetupMode state
};
checkSetupStatus();
const interval = setInterval(checkSetupStatus, 3000);
return () => clearInterval(interval);
}, [wasInSetupMode, location.pathname, navigate]); // ← wasInSetupMode causes loops
Fixed:
useEffect(() => {
let wasInSetup = false; // Local variable instead of state
const checkSetupStatus = async () => {
try {
const data = await setupApi.checkHealth();
const currentSetupMode = data.status === 'waiting for configuration';
if (currentSetupMode) {
wasInSetup = true;
}
if (wasInSetup && !currentSetupMode && location.pathname === '/setup') {
console.log('Setup completed - redirecting to login');
navigate('/login', { replace: true });
return;
}
setIsSetupMode(currentSetupMode);
} catch (error) {
if (wasInSetup && location.pathname === '/setup') {
console.log('Setup completed (endpoint unreachable) - redirecting to login');
navigate('/login', { replace: true });
return;
}
setIsSetupMode(false);
}
};
checkSetupStatus();
const interval = setInterval(checkSetupStatus, 3000);
return () => clearInterval(interval);
}, [location.pathname, navigate]); // Remove wasInSetupMode from deps
Test:
- Fresh setup
- Complete wizard
- Restart server
- Watch for flashing - should cleanly redirect to login
4. Commit and Push Fixes
git add aggregator-server/internal/api/middleware/rate_limiter.go
git add aggregator-web/src/components/SetupCompletionChecker.tsx
git commit -m "fix: namespace rate limiter keys and prevent setup checker interval loops
Rate limiter fix:
- Namespace keys by limit type to prevent counter sharing across endpoints
- Previously all KeyByIP endpoints shared same counter causing false rate limits
- Now agent_registration, public_access, etc have separate counters per IP
Session loop fix:
- Remove wasInSetupMode from SetupCompletionChecker dependency array
- Use local variable instead of state to prevent interval multiplication
- Prevents rapid refresh loop during server restart after setup
Potential fixes for recurring first-registration rate limit issue and setup flashing bug."
git push
Environment Notes
- Testing Location: This machine (
/home/memory/Desktop/Projects/RedFlag) - Remote Server: Separate machine, can't SSH to it tonight
- Branch:
feature/host-restart-handling - Last Commit:
5e9c27b(NULL reboot_reason fix)
Files to Read Next Session
docs/NeedsDoing/RateLimitFirstRequestBug.md- Detailed bug analysisdocs/NeedsDoing/SessionLoopBug.md- Session loop details and previous fixdocs/NeedsDoing/test-rate-limit.sh- Executable test script
Technical Debt Notes
- Shutdown command hardcoded (1-minute delay) - need to make user-adjustable later
- Windows reboot detection needs better method than registry keys (no event log yet)
- These NeedsDoing files are local only, not committed to git
Communication Style Reminder
- Less is more, no emojis
- No enterprise marketing speak
- "Potential fixes" is our verbiage
- Casual sysadmin tone
- Git commits: technical, straightforward, honest about uncertainties
Love ya too. Pick this up by reading these files, running the rate limit test, confirming the theory, then implementing both fixes. Test thoroughly before pushing.