Files
Redflag/docs/3_BACKLOG/P0-001_Rate-Limit-First-Request-Bug.md

153 lines
5.1 KiB
Markdown

# P0-001: Rate Limit First Request Bug
**Priority:** P0 (Critical)
**Source Reference:** From RateLimitFirstRequestBug.md line 4
**Date Identified:** 2025-11-12
## Problem Description
Every FIRST agent registration gets rate limited with HTTP 429 Too Many Requests, even though it's the very first request from a clean system. This happens consistently when running the one-liner installer, forcing a 1-minute wait before the registration succeeds.
**Expected Behavior:** First registration should succeed immediately (0/5 requests used)
**Actual Behavior:** First registration gets 429 Too Many Requests
## Reproduction Steps
1. Full rebuild to ensure clean state:
```bash
docker-compose down -v --remove-orphans && \
rm config/.env && \
docker-compose build --no-cache && \
cp config/.env.bootstrap.example config/.env && \
docker-compose up -d
```
2. Wait for server to be ready (sleep 10)
3. Complete setup wizard and generate a registration token
4. Make first registration API call:
```bash
curl -v -X POST http://localhost:8080/api/v1/agents/register \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{
"hostname": "test-host",
"os_type": "linux",
"os_version": "Fedora 39",
"os_architecture": "x86_64",
"agent_version": "0.1.17"
}'
```
5. Observe 429 response on first request
## Root Cause Analysis
Most likely cause is **Rate Limiter Key Namespace Bug** - rate limiter keys aren't namespaced by limit type, causing different endpoints to share the same counter.
**Current (broken) implementation:**
```go
key := keyFunc(c) // Just "127.0.0.1"
allowed, resetTime := rl.checkRateLimit(key, config)
```
**The issue:** Download + Install + Register endpoints all use the same IP-based key, so 3 requests count against a shared 5-request limit.
## Proposed Solution
Implement namespacing for rate limiter keys by limit type:
```go
key := keyFunc(c)
namespacedKey := limitType + ":" + key // "agent_registration:127.0.0.1"
allowed, resetTime := rl.checkRateLimit(namespacedKey, config)
```
This ensures:
- `agent_registration` endpoints get their own counter per IP
- `public_access` endpoints (downloads, install scripts) get their own counter
- `agent_reports` endpoints get their own counter
## Definition of Done
- [ ] First agent registration request succeeds with HTTP 200/201
- [ ] Rate limit headers show `X-RateLimit-Remaining: 4` on first request
- [ ] Multiple endpoints don't interfere with each other's counters
- [ ] Rate limiting still works correctly after 5 requests to same endpoint type
- [ ] Agent one-liner installer works without forced 1-minute wait
## Test Plan
1. **Direct API Test:**
```bash
# Test 1: Verify first request succeeds
curl -s -w "\nStatus: %{http_code}, Remaining: %{x-ratelimit-remaining}\n" \
-X POST http://localhost:8080/api/v1/agents/register \
-H "Authorization: Bearer $TOKEN" \
-d '{"hostname":"test","os_type":"linux","os_version":"test","os_architecture":"x86_64","agent_version":"0.1.17"}'
# Expected: Status: 200/201, Remaining: 4
```
2. **Cross-Endpoint Isolation Test:**
```bash
# Make requests to different endpoint types
curl http://localhost:8080/api/v1/downloads/linux/amd64 # public_access
curl http://localhost:8080/api/v1/install/linux # public_access
curl -X POST http://localhost:8080/api/v1/agents/register -H "Authorization: Bearer $TOKEN" -d '{"hostname":"test"}' # agent_registration
# Registration should still have full limit available
```
3. **Rate Limit Still Works Test:**
```bash
# Make 6 registration requests
for i in {1..6}; do
curl -s -w "Request $i: %{http_code}\n" \
-X POST http://localhost:8080/api/v1/agents/register \
-H "Authorization: Bearer $TOKEN" \
-d "{\"hostname\":\"test-$i\",\"os_type\":\"linux\"}"
done
# Expected: Requests 1-5 = 200/201, Request 6 = 429
```
4. **Agent Binary Integration Test:**
```bash
# Download and test actual agent registration
wget http://localhost:8080/api/v1/downloads/linux/amd64 -O redflag-agent
chmod +x redflag-agent
./redflag-agent --server http://localhost:8080 --token "$TOKEN" --register
# Should succeed immediately without rate limit errors
```
## Files to Modify
- `aggregator-server/internal/api/middleware/rate_limiter.go` (likely location)
- Any rate limiting configuration files
- Tests for rate limiting functionality
## Impact
- **Critical:** Blocks new agent installations
- **User Experience:** Forces unnecessary 1-minute delays during setup
- **Reliability:** Makes system appear broken during normal operations
- **Production:** Prevents smooth agent deployment workflows
## Verification Commands
After fix implementation:
```bash
# Check rate limit headers on first request
curl -I -X POST http://localhost:8080/api/v1/agents/register \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"hostname":"test"}'
# Should show:
# X-RateLimit-Limit: 5
# X-RateLimit-Remaining: 4
# X-RateLimit-Reset: [timestamp]
```