153 lines
5.1 KiB
Markdown
153 lines
5.1 KiB
Markdown
# P0-001: Rate Limit First Request Bug
|
|
|
|
**Priority:** P0 (Critical)
|
|
**Source Reference:** From RateLimitFirstRequestBug.md line 4
|
|
**Date Identified:** 2025-11-12
|
|
|
|
## Problem Description
|
|
|
|
Every FIRST agent registration gets rate limited with HTTP 429 Too Many Requests, even though it's the very first request from a clean system. This happens consistently when running the one-liner installer, forcing a 1-minute wait before the registration succeeds.
|
|
|
|
**Expected Behavior:** First registration should succeed immediately (0/5 requests used)
|
|
**Actual Behavior:** First registration gets 429 Too Many Requests
|
|
|
|
## Reproduction Steps
|
|
|
|
1. Full rebuild to ensure clean state:
|
|
```bash
|
|
docker-compose down -v --remove-orphans && \
|
|
rm config/.env && \
|
|
docker-compose build --no-cache && \
|
|
cp config/.env.bootstrap.example config/.env && \
|
|
docker-compose up -d
|
|
```
|
|
|
|
2. Wait for server to be ready (sleep 10)
|
|
|
|
3. Complete setup wizard and generate a registration token
|
|
|
|
4. Make first registration API call:
|
|
```bash
|
|
curl -v -X POST http://localhost:8080/api/v1/agents/register \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer $TOKEN" \
|
|
-d '{
|
|
"hostname": "test-host",
|
|
"os_type": "linux",
|
|
"os_version": "Fedora 39",
|
|
"os_architecture": "x86_64",
|
|
"agent_version": "0.1.17"
|
|
}'
|
|
```
|
|
|
|
5. Observe 429 response on first request
|
|
|
|
## Root Cause Analysis
|
|
|
|
Most likely cause is **Rate Limiter Key Namespace Bug** - rate limiter keys aren't namespaced by limit type, causing different endpoints to share the same counter.
|
|
|
|
**Current (broken) implementation:**
|
|
```go
|
|
key := keyFunc(c) // Just "127.0.0.1"
|
|
allowed, resetTime := rl.checkRateLimit(key, config)
|
|
```
|
|
|
|
**The issue:** Download + Install + Register endpoints all use the same IP-based key, so 3 requests count against a shared 5-request limit.
|
|
|
|
## Proposed Solution
|
|
|
|
Implement namespacing for rate limiter keys by limit type:
|
|
|
|
```go
|
|
key := keyFunc(c)
|
|
namespacedKey := limitType + ":" + key // "agent_registration:127.0.0.1"
|
|
allowed, resetTime := rl.checkRateLimit(namespacedKey, config)
|
|
```
|
|
|
|
This ensures:
|
|
- `agent_registration` endpoints get their own counter per IP
|
|
- `public_access` endpoints (downloads, install scripts) get their own counter
|
|
- `agent_reports` endpoints get their own counter
|
|
|
|
## Definition of Done
|
|
|
|
- [ ] First agent registration request succeeds with HTTP 200/201
|
|
- [ ] Rate limit headers show `X-RateLimit-Remaining: 4` on first request
|
|
- [ ] Multiple endpoints don't interfere with each other's counters
|
|
- [ ] Rate limiting still works correctly after 5 requests to same endpoint type
|
|
- [ ] Agent one-liner installer works without forced 1-minute wait
|
|
|
|
## Test Plan
|
|
|
|
1. **Direct API Test:**
|
|
```bash
|
|
# Test 1: Verify first request succeeds
|
|
curl -s -w "\nStatus: %{http_code}, Remaining: %{x-ratelimit-remaining}\n" \
|
|
-X POST http://localhost:8080/api/v1/agents/register \
|
|
-H "Authorization: Bearer $TOKEN" \
|
|
-d '{"hostname":"test","os_type":"linux","os_version":"test","os_architecture":"x86_64","agent_version":"0.1.17"}'
|
|
|
|
# Expected: Status: 200/201, Remaining: 4
|
|
```
|
|
|
|
2. **Cross-Endpoint Isolation Test:**
|
|
```bash
|
|
# Make requests to different endpoint types
|
|
curl http://localhost:8080/api/v1/downloads/linux/amd64 # public_access
|
|
curl http://localhost:8080/api/v1/install/linux # public_access
|
|
curl -X POST http://localhost:8080/api/v1/agents/register -H "Authorization: Bearer $TOKEN" -d '{"hostname":"test"}' # agent_registration
|
|
|
|
# Registration should still have full limit available
|
|
```
|
|
|
|
3. **Rate Limit Still Works Test:**
|
|
```bash
|
|
# Make 6 registration requests
|
|
for i in {1..6}; do
|
|
curl -s -w "Request $i: %{http_code}\n" \
|
|
-X POST http://localhost:8080/api/v1/agents/register \
|
|
-H "Authorization: Bearer $TOKEN" \
|
|
-d "{\"hostname\":\"test-$i\",\"os_type\":\"linux\"}"
|
|
done
|
|
|
|
# Expected: Requests 1-5 = 200/201, Request 6 = 429
|
|
```
|
|
|
|
4. **Agent Binary Integration Test:**
|
|
```bash
|
|
# Download and test actual agent registration
|
|
wget http://localhost:8080/api/v1/downloads/linux/amd64 -O redflag-agent
|
|
chmod +x redflag-agent
|
|
./redflag-agent --server http://localhost:8080 --token "$TOKEN" --register
|
|
|
|
# Should succeed immediately without rate limit errors
|
|
```
|
|
|
|
## Files to Modify
|
|
|
|
- `aggregator-server/internal/api/middleware/rate_limiter.go` (likely location)
|
|
- Any rate limiting configuration files
|
|
- Tests for rate limiting functionality
|
|
|
|
## Impact
|
|
|
|
- **Critical:** Blocks new agent installations
|
|
- **User Experience:** Forces unnecessary 1-minute delays during setup
|
|
- **Reliability:** Makes system appear broken during normal operations
|
|
- **Production:** Prevents smooth agent deployment workflows
|
|
|
|
## Verification Commands
|
|
|
|
After fix implementation:
|
|
```bash
|
|
# Check rate limit headers on first request
|
|
curl -I -X POST http://localhost:8080/api/v1/agents/register \
|
|
-H "Authorization: Bearer $TOKEN" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"hostname":"test"}'
|
|
|
|
# Should show:
|
|
# X-RateLimit-Limit: 5
|
|
# X-RateLimit-Remaining: 4
|
|
# X-RateLimit-Reset: [timestamp]
|
|
``` |