Add docs and project files - force for Culurien
This commit is contained in:
228
docs/4_LOG/_originals_archive.backup/RateLimitFirstRequestBug.md
Normal file
228
docs/4_LOG/_originals_archive.backup/RateLimitFirstRequestBug.md
Normal file
@@ -0,0 +1,228 @@
|
||||
# Rate Limit First Request Bug
|
||||
|
||||
## Issue Description
|
||||
Every FIRST agent registration gets rate limited, even though it's the very first request. This happens consistently when running the one-liner installer, forcing a 1-minute wait before the registration succeeds.
|
||||
|
||||
**Expected:** First registration should succeed immediately (0/5 requests used)
|
||||
**Actual:** First registration gets 429 Too Many Requests
|
||||
|
||||
## Test Setup
|
||||
|
||||
```bash
|
||||
# Full rebuild to ensure clean state
|
||||
docker-compose down -v --remove-orphans && \
|
||||
rm config/.env && \
|
||||
docker-compose build --no-cache && \
|
||||
cp config/.env.bootstrap.example config/.env && \
|
||||
docker-compose up -d
|
||||
|
||||
# Wait for server to be ready
|
||||
sleep 10
|
||||
|
||||
# Complete setup wizard (manual or automated)
|
||||
# Generate a registration token
|
||||
```
|
||||
|
||||
## Test 1: Direct Registration API Call
|
||||
|
||||
This tests the raw registration endpoint without any agent code:
|
||||
|
||||
```bash
|
||||
# Get a registration token from the UI first
|
||||
TOKEN="your-registration-token-here"
|
||||
|
||||
# Make the registration request with verbose output
|
||||
curl -v -X POST http://localhost:8080/api/v1/agents/register \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-d '{
|
||||
"hostname": "test-host",
|
||||
"os_type": "linux",
|
||||
"os_version": "Fedora 39",
|
||||
"os_architecture": "x86_64",
|
||||
"agent_version": "0.1.17"
|
||||
}' 2>&1 | tee test1-output.txt
|
||||
|
||||
# Look for these in output:
|
||||
echo ""
|
||||
echo "=== Rate Limit Headers ==="
|
||||
grep "X-RateLimit" test1-output.txt
|
||||
grep "429\|Retry-After" test1-output.txt
|
||||
```
|
||||
|
||||
**What to check:**
|
||||
- Does it return 429 on the FIRST call?
|
||||
- What are the X-RateLimit-Limit and X-RateLimit-Remaining values?
|
||||
- What does the error response body say (which bucket: agent_registration, public_access)?
|
||||
|
||||
## Test 2: Multiple Sequential Requests
|
||||
|
||||
Test if the rate limiter is properly tracking requests:
|
||||
|
||||
```bash
|
||||
TOKEN="your-registration-token-here"
|
||||
|
||||
for i in {1..6}; do
|
||||
echo "=== Attempt $i ==="
|
||||
curl -s -w "\nHTTP Status: %{http_code}\n" \
|
||||
-X POST http://localhost:8080/api/v1/agents/register \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-d "{\"hostname\":\"test-$i\",\"os_type\":\"linux\",\"os_version\":\"test\",\"os_architecture\":\"x86_64\",\"agent_version\":\"0.1.17\"}" \
|
||||
| grep -E "(error|HTTP Status|remaining)"
|
||||
sleep 1
|
||||
done
|
||||
```
|
||||
|
||||
**Expected:**
|
||||
- Requests 1-5: HTTP 200 (or 201)
|
||||
- Request 6: HTTP 429
|
||||
|
||||
**If Request 1 fails:**
|
||||
- Rate limiter is broken
|
||||
- OR there's key collision with other endpoints
|
||||
- OR agent code is making multiple calls internally
|
||||
|
||||
## Test 3: Check for Preflight/OPTIONS Requests
|
||||
|
||||
```bash
|
||||
# Enable Gin debug mode to see all requests
|
||||
docker-compose logs -f server 2>&1 | grep -E "(POST|OPTIONS|GET).*agents/register"
|
||||
```
|
||||
|
||||
Run test 1 in another terminal and watch for:
|
||||
- Any OPTIONS requests before POST
|
||||
- Multiple POST requests for a single registration
|
||||
- Unexpected GET requests
|
||||
|
||||
## Test 4: Check Rate Limiter Key Collision
|
||||
|
||||
This tests if different endpoints share the same rate limit counter:
|
||||
|
||||
```bash
|
||||
TOKEN="your-token"
|
||||
IP=$(hostname -I | awk '{print $1}')
|
||||
|
||||
echo "Testing from IP: $IP"
|
||||
|
||||
# Test download endpoint (public_access)
|
||||
curl -s -w "\nDownload Status: %{http_code}\n" \
|
||||
-H "X-Forwarded-For: $IP" \
|
||||
http://localhost:8080/api/v1/downloads/linux/amd64
|
||||
|
||||
sleep 1
|
||||
|
||||
# Test install script endpoint (public_access)
|
||||
curl -s -w "\nInstall Status: %{http_code}\n" \
|
||||
-H "X-Forwarded-For: $IP" \
|
||||
http://localhost:8080/api/v1/install/linux
|
||||
|
||||
sleep 1
|
||||
|
||||
# Now test registration (agent_registration)
|
||||
curl -s -w "\nRegistration Status: %{http_code}\n" \
|
||||
-H "X-Forwarded-For: $IP" \
|
||||
-X POST http://localhost:8080/api/v1/agents/register \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-d '{"hostname":"test","os_type":"linux","os_version":"test","os_architecture":"x86_64","agent_version":"0.1.17"}' \
|
||||
| grep -E "(Status|error|remaining)"
|
||||
```
|
||||
|
||||
**Theory:** If rate limiters share keys by IP only (not namespaced by limit type), then downloading + install script + registration = 3 requests against a shared 5-request limit, leaving only 2 requests before hitting the limit.
|
||||
|
||||
## Test 5: Agent Binary Registration
|
||||
|
||||
Test what the actual agent does:
|
||||
|
||||
```bash
|
||||
# Download agent
|
||||
wget http://localhost:8080/api/v1/downloads/linux/amd64 -O redflag-agent
|
||||
chmod +x redflag-agent
|
||||
|
||||
# Remove any existing config
|
||||
sudo rm -f /etc/aggregator/config.json
|
||||
|
||||
# Enable debug output and register
|
||||
export DEBUG=1
|
||||
./redflag-agent --server http://localhost:8080 --token "your-token" --register 2>&1 | tee agent-registration.log
|
||||
|
||||
# Check for multiple registration attempts
|
||||
grep -c "POST.*agents/register" agent-registration.log
|
||||
```
|
||||
|
||||
## Test 6: Server Logs Analysis
|
||||
|
||||
Check what the server sees:
|
||||
|
||||
```bash
|
||||
# Clear logs
|
||||
docker-compose logs --tail=0 -f server > server-logs.txt &
|
||||
LOG_PID=$!
|
||||
|
||||
# Wait a moment
|
||||
sleep 2
|
||||
|
||||
# Make a registration request
|
||||
curl -X POST http://localhost:8080/api/v1/agents/register \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer your-token" \
|
||||
-d '{"hostname":"test","os_type":"linux","os_version":"test","os_architecture":"x86_64","agent_version":"0.1.17"}'
|
||||
|
||||
# Wait for logs
|
||||
sleep 2
|
||||
kill $LOG_PID
|
||||
|
||||
# Analyze
|
||||
echo "=== All Registration Requests ==="
|
||||
grep "register" server-logs.txt
|
||||
|
||||
echo "=== Rate Limit Events ==="
|
||||
grep -i "rate\|limit\|429" server-logs.txt
|
||||
```
|
||||
|
||||
## Debugging Checklist
|
||||
|
||||
- [ ] Does the FIRST request fail with 429?
|
||||
- [ ] What's the X-RateLimit-Remaining value on first request?
|
||||
- [ ] Are there multiple requests happening for a single registration?
|
||||
- [ ] Do download/install endpoints count against registration limit?
|
||||
- [ ] Does the agent binary retry internally on failure?
|
||||
- [ ] Are there preflight OPTIONS requests?
|
||||
- [ ] What's the rate limit key being used (check logs)?
|
||||
|
||||
## Potential Root Causes
|
||||
|
||||
1. **Key Namespace Bug**: Rate limiter keys aren't namespaced by limit type
|
||||
- Fix: Prepend limitType to key (e.g., "agent_registration:127.0.0.1")
|
||||
|
||||
2. **Agent Retry Logic**: Agent retries registration on first failure
|
||||
- Fix: Check agent registration code for retry loops
|
||||
|
||||
3. **Shared Counter**: Download + Install + Register share same counter
|
||||
- Fix: Namespace keys or use different key functions
|
||||
|
||||
4. **Off-by-One**: Rate limiter logic checks `>=` instead of `>`
|
||||
- Fix: Change condition in checkRateLimit()
|
||||
|
||||
5. **Preflight Requests**: Browser/client making OPTIONS requests
|
||||
- Fix: Exclude OPTIONS from rate limiting
|
||||
|
||||
## Expected Fix
|
||||
|
||||
Most likely: Rate limiter keys need namespacing.
|
||||
|
||||
Current (broken):
|
||||
```go
|
||||
key := keyFunc(c) // Just "127.0.0.1"
|
||||
allowed, resetTime := rl.checkRateLimit(key, config)
|
||||
```
|
||||
|
||||
Fixed:
|
||||
```go
|
||||
key := keyFunc(c)
|
||||
namespacedKey := limitType + ":" + key // "agent_registration:127.0.0.1"
|
||||
allowed, resetTime := rl.checkRateLimit(namespacedKey, config)
|
||||
```
|
||||
|
||||
This ensures agent_registration, public_access, and agent_reports each get their own counters per IP.
|
||||
Reference in New Issue
Block a user