# Rate Limit First Request Bug ## Issue Description Every FIRST agent registration gets rate limited, even though it's the very first request. This happens consistently when running the one-liner installer, forcing a 1-minute wait before the registration succeeds. **Expected:** First registration should succeed immediately (0/5 requests used) **Actual:** First registration gets 429 Too Many Requests ## Test Setup ```bash # Full rebuild to ensure clean state docker-compose down -v --remove-orphans && \ rm config/.env && \ docker-compose build --no-cache && \ cp config/.env.bootstrap.example config/.env && \ docker-compose up -d # Wait for server to be ready sleep 10 # Complete setup wizard (manual or automated) # Generate a registration token ``` ## Test 1: Direct Registration API Call This tests the raw registration endpoint without any agent code: ```bash # Get a registration token from the UI first TOKEN="your-registration-token-here" # Make the registration request with verbose output curl -v -X POST http://localhost:8080/api/v1/agents/register \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN" \ -d '{ "hostname": "test-host", "os_type": "linux", "os_version": "Fedora 39", "os_architecture": "x86_64", "agent_version": "0.1.17" }' 2>&1 | tee test1-output.txt # Look for these in output: echo "" echo "=== Rate Limit Headers ===" grep "X-RateLimit" test1-output.txt grep "429\|Retry-After" test1-output.txt ``` **What to check:** - Does it return 429 on the FIRST call? - What are the X-RateLimit-Limit and X-RateLimit-Remaining values? - What does the error response body say (which bucket: agent_registration, public_access)? ## Test 2: Multiple Sequential Requests Test if the rate limiter is properly tracking requests: ```bash TOKEN="your-registration-token-here" for i in {1..6}; do echo "=== Attempt $i ===" curl -s -w "\nHTTP Status: %{http_code}\n" \ -X POST http://localhost:8080/api/v1/agents/register \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN" \ -d "{\"hostname\":\"test-$i\",\"os_type\":\"linux\",\"os_version\":\"test\",\"os_architecture\":\"x86_64\",\"agent_version\":\"0.1.17\"}" \ | grep -E "(error|HTTP Status|remaining)" sleep 1 done ``` **Expected:** - Requests 1-5: HTTP 200 (or 201) - Request 6: HTTP 429 **If Request 1 fails:** - Rate limiter is broken - OR there's key collision with other endpoints - OR agent code is making multiple calls internally ## Test 3: Check for Preflight/OPTIONS Requests ```bash # Enable Gin debug mode to see all requests docker-compose logs -f server 2>&1 | grep -E "(POST|OPTIONS|GET).*agents/register" ``` Run test 1 in another terminal and watch for: - Any OPTIONS requests before POST - Multiple POST requests for a single registration - Unexpected GET requests ## Test 4: Check Rate Limiter Key Collision This tests if different endpoints share the same rate limit counter: ```bash TOKEN="your-token" IP=$(hostname -I | awk '{print $1}') echo "Testing from IP: $IP" # Test download endpoint (public_access) curl -s -w "\nDownload Status: %{http_code}\n" \ -H "X-Forwarded-For: $IP" \ http://localhost:8080/api/v1/downloads/linux/amd64 sleep 1 # Test install script endpoint (public_access) curl -s -w "\nInstall Status: %{http_code}\n" \ -H "X-Forwarded-For: $IP" \ http://localhost:8080/api/v1/install/linux sleep 1 # Now test registration (agent_registration) curl -s -w "\nRegistration Status: %{http_code}\n" \ -H "X-Forwarded-For: $IP" \ -X POST http://localhost:8080/api/v1/agents/register \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN" \ -d '{"hostname":"test","os_type":"linux","os_version":"test","os_architecture":"x86_64","agent_version":"0.1.17"}' \ | grep -E "(Status|error|remaining)" ``` **Theory:** If rate limiters share keys by IP only (not namespaced by limit type), then downloading + install script + registration = 3 requests against a shared 5-request limit, leaving only 2 requests before hitting the limit. ## Test 5: Agent Binary Registration Test what the actual agent does: ```bash # Download agent wget http://localhost:8080/api/v1/downloads/linux/amd64 -O redflag-agent chmod +x redflag-agent # Remove any existing config sudo rm -f /etc/aggregator/config.json # Enable debug output and register export DEBUG=1 ./redflag-agent --server http://localhost:8080 --token "your-token" --register 2>&1 | tee agent-registration.log # Check for multiple registration attempts grep -c "POST.*agents/register" agent-registration.log ``` ## Test 6: Server Logs Analysis Check what the server sees: ```bash # Clear logs docker-compose logs --tail=0 -f server > server-logs.txt & LOG_PID=$! # Wait a moment sleep 2 # Make a registration request curl -X POST http://localhost:8080/api/v1/agents/register \ -H "Content-Type: application/json" \ -H "Authorization: Bearer your-token" \ -d '{"hostname":"test","os_type":"linux","os_version":"test","os_architecture":"x86_64","agent_version":"0.1.17"}' # Wait for logs sleep 2 kill $LOG_PID # Analyze echo "=== All Registration Requests ===" grep "register" server-logs.txt echo "=== Rate Limit Events ===" grep -i "rate\|limit\|429" server-logs.txt ``` ## Debugging Checklist - [ ] Does the FIRST request fail with 429? - [ ] What's the X-RateLimit-Remaining value on first request? - [ ] Are there multiple requests happening for a single registration? - [ ] Do download/install endpoints count against registration limit? - [ ] Does the agent binary retry internally on failure? - [ ] Are there preflight OPTIONS requests? - [ ] What's the rate limit key being used (check logs)? ## Potential Root Causes 1. **Key Namespace Bug**: Rate limiter keys aren't namespaced by limit type - Fix: Prepend limitType to key (e.g., "agent_registration:127.0.0.1") 2. **Agent Retry Logic**: Agent retries registration on first failure - Fix: Check agent registration code for retry loops 3. **Shared Counter**: Download + Install + Register share same counter - Fix: Namespace keys or use different key functions 4. **Off-by-One**: Rate limiter logic checks `>=` instead of `>` - Fix: Change condition in checkRateLimit() 5. **Preflight Requests**: Browser/client making OPTIONS requests - Fix: Exclude OPTIONS from rate limiting ## Expected Fix Most likely: Rate limiter keys need namespacing. Current (broken): ```go key := keyFunc(c) // Just "127.0.0.1" allowed, resetTime := rl.checkRateLimit(key, config) ``` Fixed: ```go key := keyFunc(c) namespacedKey := limitType + ":" + key // "agent_registration:127.0.0.1" allowed, resetTime := rl.checkRateLimit(namespacedKey, config) ``` This ensures agent_registration, public_access, and agent_reports each get their own counters per IP.