Files

7.8 KiB

Session Summary & Resume Point

What We Just Completed

Branch: feature/host-restart-handling Status: Pushed to remote, ready for testing

Implemented Features (Issues #4 and #6)

  1. Issue #6 - Agent Version Display Fix

    • Set CurrentVersion during registration instead of waiting for first check-in
    • Changed UI text from "Unknown" to "Initial Registration"
    • Files: aggregator-server/internal/api/handlers/agents.go, aggregator-web/src/pages/Agents.tsx
  2. Issue #4 - Host Restart Detection & Handling

    • Database migration 013_add_reboot_tracking.up.sql adds reboot fields
    • Agent detects pending reboots (Debian/Ubuntu, RHEL/Fedora, Windows)
    • New reboot command with 1-minute grace period
    • UI shows restart alerts and "Restart Host" button
    • Files: Migration, models, queries, handlers, agent detection, frontend components
  3. Critical Bug Fix

    • Fixed reboot_reason field causing database scan failures (was string, needed *string for NULL handling)
    • Commit: 5e9c27b
  4. Documentation

    • Added full reinstall section to README with agent re-registration steps

Current Issues Found During Testing

1. Rate Limit Bug - FIRST Request Gets Blocked

Symptom: Every first agent registration gets 429 Too Many Requests, then works after 1 minute wait.

Theory: Rate limiter keys aren't namespaced by limit type. All endpoints using KeyByIP share the same counter:

  • public_access (download, install script): 20/min
  • agent_registration: 5/min
  • Both use just the IP as key, not namespaced

Problem Location: aggregator-server/internal/api/middleware/rate_limiter.go line ~133

key := keyFunc(c)  // Just "127.0.0.1"
allowed, resetTime := rl.checkRateLimit(key, config)

Suspected Fix:

key := keyFunc(c)
namespacedKey := limitType + ":" + key  // "agent_registration:127.0.0.1"
allowed, resetTime := rl.checkRateLimit(namespacedKey, config)

Test Script: docs/NeedsDoing/test-rate-limit.sh

  • Run after fresh docker-compose up
  • Tests if first request fails
  • Tests if download/install/register share counters
  • Sequential test to find actual limit

2. Session Loop Bug - Returned

Symptom: After setup completion and server restart, UI flashes/loops rapidly on dashboard/agents/settings. Must logout and login to fix.

Previous Fix: Commit 7b77641 added logout() call, cleared auth on 401

Current Problem: SetupCompletionChecker.tsx dependency array issue

  • wasInSetupMode in dependency array causes multiple interval creation
  • Each state change creates new interval without cleaning up old ones
  • During docker restart: multiple 3-second polls overlap = flashing

Problem Location: aggregator-web/src/components/SetupCompletionChecker.tsx lines 15-52

Suspected Fix: Remove wasInSetupMode from dependency array, use local variable instead

Next Session Plan

1. Test Rate Limiter (This Machine)

# Full clean rebuild
cd /home/memory/Desktop/Projects/RedFlag
docker-compose down -v --remove-orphans && \
  rm config/.env && \
  docker-compose build --no-cache && \
  cp config/.env.bootstrap.example config/.env && \
  docker-compose up -d

# Wait for ready
sleep 15

# Complete setup wizard manually
# Generate registration token

# Run test script
cd docs/NeedsDoing
REGISTRATION_TOKEN="your-token-here" ./test-rate-limit.sh

# Check results - confirm first request bug
# Check server logs
docker-compose logs server | grep -i "rate\|limit\|429"

2. Fix Rate Limiter

If tests confirm the theory:

File: aggregator-server/internal/api/middleware/rate_limiter.go

Find the RateLimit function (around line 120-165) and update:

// BEFORE (line ~133)
key := keyFunc(c)
if key == "" {
    c.Next()
    return
}
allowed, resetTime := rl.checkRateLimit(key, config)

// AFTER
key := keyFunc(c)
if key == "" {
    c.Next()
    return
}
// Namespace the key by limit type to prevent different endpoints from sharing counters
namespacedKey := limitType + ":" + key
allowed, resetTime := rl.checkRateLimit(namespacedKey, config)

Also update getRemainingRequests function similarly (around line 209).

Test: Re-run test-rate-limit.sh - first request should succeed

3. Fix Session Loop

File: aggregator-web/src/components/SetupCompletionChecker.tsx

Current (broken):

const [wasInSetupMode, setWasInSetupMode] = useState(false);

useEffect(() => {
  const checkSetupStatus = async () => {
    // uses wasInSetupMode state
  };
  checkSetupStatus();
  const interval = setInterval(checkSetupStatus, 3000);
  return () => clearInterval(interval);
}, [wasInSetupMode, location.pathname, navigate]);  // ← wasInSetupMode causes loops

Fixed:

useEffect(() => {
  let wasInSetup = false;  // Local variable instead of state

  const checkSetupStatus = async () => {
    try {
      const data = await setupApi.checkHealth();
      const currentSetupMode = data.status === 'waiting for configuration';

      if (currentSetupMode) {
        wasInSetup = true;
      }

      if (wasInSetup && !currentSetupMode && location.pathname === '/setup') {
        console.log('Setup completed - redirecting to login');
        navigate('/login', { replace: true });
        return;
      }

      setIsSetupMode(currentSetupMode);
    } catch (error) {
      if (wasInSetup && location.pathname === '/setup') {
        console.log('Setup completed (endpoint unreachable) - redirecting to login');
        navigate('/login', { replace: true });
        return;
      }
      setIsSetupMode(false);
    }
  };

  checkSetupStatus();
  const interval = setInterval(checkSetupStatus, 3000);
  return () => clearInterval(interval);
}, [location.pathname, navigate]);  // Remove wasInSetupMode from deps

Test:

  1. Fresh setup
  2. Complete wizard
  3. Restart server
  4. Watch for flashing - should cleanly redirect to login

4. Commit and Push Fixes

git add aggregator-server/internal/api/middleware/rate_limiter.go
git add aggregator-web/src/components/SetupCompletionChecker.tsx

git commit -m "fix: namespace rate limiter keys and prevent setup checker interval loops

Rate limiter fix:
- Namespace keys by limit type to prevent counter sharing across endpoints
- Previously all KeyByIP endpoints shared same counter causing false rate limits
- Now agent_registration, public_access, etc have separate counters per IP

Session loop fix:
- Remove wasInSetupMode from SetupCompletionChecker dependency array
- Use local variable instead of state to prevent interval multiplication
- Prevents rapid refresh loop during server restart after setup

Potential fixes for recurring first-registration rate limit issue and setup flashing bug."

git push

Environment Notes

  • Testing Location: This machine (/home/memory/Desktop/Projects/RedFlag)
  • Remote Server: Separate machine, can't SSH to it tonight
  • Branch: feature/host-restart-handling
  • Last Commit: 5e9c27b (NULL reboot_reason fix)
  1. docs/NeedsDoing/RateLimitFirstRequestBug.md - Detailed bug analysis
  2. docs/NeedsDoing/SessionLoopBug.md - Session loop details and previous fix
  3. docs/NeedsDoing/test-rate-limit.sh - Executable test script

Technical Debt Notes

  • Shutdown command hardcoded (1-minute delay) - need to make user-adjustable later
  • Windows reboot detection needs better method than registry keys (no event log yet)
  • These NeedsDoing files are local only, not committed to git

Communication Style Reminder

  • Less is more, no emojis
  • No enterprise marketing speak
  • "Potential fixes" is our verbiage
  • Casual sysadmin tone
  • Git commits: technical, straightforward, honest about uncertainties

Love ya too. Pick this up by reading these files, running the rate limit test, confirming the theory, then implementing both fixes. Test thoroughly before pushing.