Files
Redflag/docs/3_BACKLOG/P0-001_Rate-Limit-First-Request-Bug.md

5.1 KiB

P0-001: Rate Limit First Request Bug

Priority: P0 (Critical) Source Reference: From RateLimitFirstRequestBug.md line 4 Date Identified: 2025-11-12

Problem Description

Every FIRST agent registration gets rate limited with HTTP 429 Too Many Requests, even though it's the very first request from a clean system. This happens consistently when running the one-liner installer, forcing a 1-minute wait before the registration succeeds.

Expected Behavior: First registration should succeed immediately (0/5 requests used) Actual Behavior: First registration gets 429 Too Many Requests

Reproduction Steps

  1. Full rebuild to ensure clean state:

    docker-compose down -v --remove-orphans && \
      rm config/.env && \
      docker-compose build --no-cache && \
      cp config/.env.bootstrap.example config/.env && \
      docker-compose up -d
    
  2. Wait for server to be ready (sleep 10)

  3. Complete setup wizard and generate a registration token

  4. Make first registration API call:

    curl -v -X POST http://localhost:8080/api/v1/agents/register \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer $TOKEN" \
      -d '{
        "hostname": "test-host",
        "os_type": "linux",
        "os_version": "Fedora 39",
        "os_architecture": "x86_64",
        "agent_version": "0.1.17"
      }'
    
  5. Observe 429 response on first request

Root Cause Analysis

Most likely cause is Rate Limiter Key Namespace Bug - rate limiter keys aren't namespaced by limit type, causing different endpoints to share the same counter.

Current (broken) implementation:

key := keyFunc(c)  // Just "127.0.0.1"
allowed, resetTime := rl.checkRateLimit(key, config)

The issue: Download + Install + Register endpoints all use the same IP-based key, so 3 requests count against a shared 5-request limit.

Proposed Solution

Implement namespacing for rate limiter keys by limit type:

key := keyFunc(c)
namespacedKey := limitType + ":" + key  // "agent_registration:127.0.0.1"
allowed, resetTime := rl.checkRateLimit(namespacedKey, config)

This ensures:

  • agent_registration endpoints get their own counter per IP
  • public_access endpoints (downloads, install scripts) get their own counter
  • agent_reports endpoints get their own counter

Definition of Done

  • First agent registration request succeeds with HTTP 200/201
  • Rate limit headers show X-RateLimit-Remaining: 4 on first request
  • Multiple endpoints don't interfere with each other's counters
  • Rate limiting still works correctly after 5 requests to same endpoint type
  • Agent one-liner installer works without forced 1-minute wait

Test Plan

  1. Direct API Test:

    # Test 1: Verify first request succeeds
    curl -s -w "\nStatus: %{http_code}, Remaining: %{x-ratelimit-remaining}\n" \
      -X POST http://localhost:8080/api/v1/agents/register \
      -H "Authorization: Bearer $TOKEN" \
      -d '{"hostname":"test","os_type":"linux","os_version":"test","os_architecture":"x86_64","agent_version":"0.1.17"}'
    
    # Expected: Status: 200/201, Remaining: 4
    
  2. Cross-Endpoint Isolation Test:

    # Make requests to different endpoint types
    curl http://localhost:8080/api/v1/downloads/linux/amd64  # public_access
    curl http://localhost:8080/api/v1/install/linux          # public_access
    curl -X POST http://localhost:8080/api/v1/agents/register -H "Authorization: Bearer $TOKEN" -d '{"hostname":"test"}'  # agent_registration
    
    # Registration should still have full limit available
    
  3. Rate Limit Still Works Test:

    # Make 6 registration requests
    for i in {1..6}; do
      curl -s -w "Request $i: %{http_code}\n" \
        -X POST http://localhost:8080/api/v1/agents/register \
        -H "Authorization: Bearer $TOKEN" \
        -d "{\"hostname\":\"test-$i\",\"os_type\":\"linux\"}"
    done
    
    # Expected: Requests 1-5 = 200/201, Request 6 = 429
    
  4. Agent Binary Integration Test:

    # Download and test actual agent registration
    wget http://localhost:8080/api/v1/downloads/linux/amd64 -O redflag-agent
    chmod +x redflag-agent
    ./redflag-agent --server http://localhost:8080 --token "$TOKEN" --register
    
    # Should succeed immediately without rate limit errors
    

Files to Modify

  • aggregator-server/internal/api/middleware/rate_limiter.go (likely location)
  • Any rate limiting configuration files
  • Tests for rate limiting functionality

Impact

  • Critical: Blocks new agent installations
  • User Experience: Forces unnecessary 1-minute delays during setup
  • Reliability: Makes system appear broken during normal operations
  • Production: Prevents smooth agent deployment workflows

Verification Commands

After fix implementation:

# Check rate limit headers on first request
curl -I -X POST http://localhost:8080/api/v1/agents/register \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"hostname":"test"}'

# Should show:
# X-RateLimit-Limit: 5
# X-RateLimit-Remaining: 4
# X-RateLimit-Reset: [timestamp]