Files
Redflag/docs/4_LOG/_originals_archive.backup/RateLimitFirstRequestBug.md

6.7 KiB

Rate Limit First Request Bug

Issue Description

Every FIRST agent registration gets rate limited, even though it's the very first request. This happens consistently when running the one-liner installer, forcing a 1-minute wait before the registration succeeds.

Expected: First registration should succeed immediately (0/5 requests used) Actual: First registration gets 429 Too Many Requests

Test Setup

# Full rebuild to ensure clean state
docker-compose down -v --remove-orphans && \
  rm config/.env && \
  docker-compose build --no-cache && \
  cp config/.env.bootstrap.example config/.env && \
  docker-compose up -d

# Wait for server to be ready
sleep 10

# Complete setup wizard (manual or automated)
# Generate a registration token

Test 1: Direct Registration API Call

This tests the raw registration endpoint without any agent code:

# Get a registration token from the UI first
TOKEN="your-registration-token-here"

# Make the registration request with verbose output
curl -v -X POST http://localhost:8080/api/v1/agents/register \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "hostname": "test-host",
    "os_type": "linux",
    "os_version": "Fedora 39",
    "os_architecture": "x86_64",
    "agent_version": "0.1.17"
  }' 2>&1 | tee test1-output.txt

# Look for these in output:
echo ""
echo "=== Rate Limit Headers ==="
grep "X-RateLimit" test1-output.txt
grep "429\|Retry-After" test1-output.txt

What to check:

  • Does it return 429 on the FIRST call?
  • What are the X-RateLimit-Limit and X-RateLimit-Remaining values?
  • What does the error response body say (which bucket: agent_registration, public_access)?

Test 2: Multiple Sequential Requests

Test if the rate limiter is properly tracking requests:

TOKEN="your-registration-token-here"

for i in {1..6}; do
  echo "=== Attempt $i ==="
  curl -s -w "\nHTTP Status: %{http_code}\n" \
    -X POST http://localhost:8080/api/v1/agents/register \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $TOKEN" \
    -d "{\"hostname\":\"test-$i\",\"os_type\":\"linux\",\"os_version\":\"test\",\"os_architecture\":\"x86_64\",\"agent_version\":\"0.1.17\"}" \
    | grep -E "(error|HTTP Status|remaining)"
  sleep 1
done

Expected:

  • Requests 1-5: HTTP 200 (or 201)
  • Request 6: HTTP 429

If Request 1 fails:

  • Rate limiter is broken
  • OR there's key collision with other endpoints
  • OR agent code is making multiple calls internally

Test 3: Check for Preflight/OPTIONS Requests

# Enable Gin debug mode to see all requests
docker-compose logs -f server 2>&1 | grep -E "(POST|OPTIONS|GET).*agents/register"

Run test 1 in another terminal and watch for:

  • Any OPTIONS requests before POST
  • Multiple POST requests for a single registration
  • Unexpected GET requests

Test 4: Check Rate Limiter Key Collision

This tests if different endpoints share the same rate limit counter:

TOKEN="your-token"
IP=$(hostname -I | awk '{print $1}')

echo "Testing from IP: $IP"

# Test download endpoint (public_access)
curl -s -w "\nDownload Status: %{http_code}\n" \
  -H "X-Forwarded-For: $IP" \
  http://localhost:8080/api/v1/downloads/linux/amd64

sleep 1

# Test install script endpoint (public_access)
curl -s -w "\nInstall Status: %{http_code}\n" \
  -H "X-Forwarded-For: $IP" \
  http://localhost:8080/api/v1/install/linux

sleep 1

# Now test registration (agent_registration)
curl -s -w "\nRegistration Status: %{http_code}\n" \
  -H "X-Forwarded-For: $IP" \
  -X POST http://localhost:8080/api/v1/agents/register \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"hostname":"test","os_type":"linux","os_version":"test","os_architecture":"x86_64","agent_version":"0.1.17"}' \
  | grep -E "(Status|error|remaining)"

Theory: If rate limiters share keys by IP only (not namespaced by limit type), then downloading + install script + registration = 3 requests against a shared 5-request limit, leaving only 2 requests before hitting the limit.

Test 5: Agent Binary Registration

Test what the actual agent does:

# Download agent
wget http://localhost:8080/api/v1/downloads/linux/amd64 -O redflag-agent
chmod +x redflag-agent

# Remove any existing config
sudo rm -f /etc/aggregator/config.json

# Enable debug output and register
export DEBUG=1
./redflag-agent --server http://localhost:8080 --token "your-token" --register 2>&1 | tee agent-registration.log

# Check for multiple registration attempts
grep -c "POST.*agents/register" agent-registration.log

Test 6: Server Logs Analysis

Check what the server sees:

# Clear logs
docker-compose logs --tail=0 -f server > server-logs.txt &
LOG_PID=$!

# Wait a moment
sleep 2

# Make a registration request
curl -X POST http://localhost:8080/api/v1/agents/register \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-token" \
  -d '{"hostname":"test","os_type":"linux","os_version":"test","os_architecture":"x86_64","agent_version":"0.1.17"}'

# Wait for logs
sleep 2
kill $LOG_PID

# Analyze
echo "=== All Registration Requests ==="
grep "register" server-logs.txt

echo "=== Rate Limit Events ==="
grep -i "rate\|limit\|429" server-logs.txt

Debugging Checklist

  • Does the FIRST request fail with 429?
  • What's the X-RateLimit-Remaining value on first request?
  • Are there multiple requests happening for a single registration?
  • Do download/install endpoints count against registration limit?
  • Does the agent binary retry internally on failure?
  • Are there preflight OPTIONS requests?
  • What's the rate limit key being used (check logs)?

Potential Root Causes

  1. Key Namespace Bug: Rate limiter keys aren't namespaced by limit type

    • Fix: Prepend limitType to key (e.g., "agent_registration:127.0.0.1")
  2. Agent Retry Logic: Agent retries registration on first failure

    • Fix: Check agent registration code for retry loops
  3. Shared Counter: Download + Install + Register share same counter

    • Fix: Namespace keys or use different key functions
  4. Off-by-One: Rate limiter logic checks >= instead of >

    • Fix: Change condition in checkRateLimit()
  5. Preflight Requests: Browser/client making OPTIONS requests

    • Fix: Exclude OPTIONS from rate limiting

Expected Fix

Most likely: Rate limiter keys need namespacing.

Current (broken):

key := keyFunc(c)  // Just "127.0.0.1"
allowed, resetTime := rl.checkRateLimit(key, config)

Fixed:

key := keyFunc(c)
namespacedKey := limitType + ":" + key  // "agent_registration:127.0.0.1"
allowed, resetTime := rl.checkRateLimit(namespacedKey, config)

This ensures agent_registration, public_access, and agent_reports each get their own counters per IP.