Add docs and project files - force for Culurien

This commit is contained in:
Fimeg
2026-03-28 20:46:24 -04:00
parent dc61797423
commit 484a7f77ce
343 changed files with 119530 additions and 0 deletions

View File

@@ -0,0 +1,228 @@
# Rate Limit First Request Bug
## Issue Description
Every FIRST agent registration gets rate limited, even though it's the very first request. This happens consistently when running the one-liner installer, forcing a 1-minute wait before the registration succeeds.
**Expected:** First registration should succeed immediately (0/5 requests used)
**Actual:** First registration gets 429 Too Many Requests
## Test Setup
```bash
# Full rebuild to ensure clean state
docker-compose down -v --remove-orphans && \
rm config/.env && \
docker-compose build --no-cache && \
cp config/.env.bootstrap.example config/.env && \
docker-compose up -d
# Wait for server to be ready
sleep 10
# Complete setup wizard (manual or automated)
# Generate a registration token
```
## Test 1: Direct Registration API Call
This tests the raw registration endpoint without any agent code:
```bash
# Get a registration token from the UI first
TOKEN="your-registration-token-here"
# Make the registration request with verbose output
curl -v -X POST http://localhost:8080/api/v1/agents/register \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{
"hostname": "test-host",
"os_type": "linux",
"os_version": "Fedora 39",
"os_architecture": "x86_64",
"agent_version": "0.1.17"
}' 2>&1 | tee test1-output.txt
# Look for these in output:
echo ""
echo "=== Rate Limit Headers ==="
grep "X-RateLimit" test1-output.txt
grep "429\|Retry-After" test1-output.txt
```
**What to check:**
- Does it return 429 on the FIRST call?
- What are the X-RateLimit-Limit and X-RateLimit-Remaining values?
- What does the error response body say (which bucket: agent_registration, public_access)?
## Test 2: Multiple Sequential Requests
Test if the rate limiter is properly tracking requests:
```bash
TOKEN="your-registration-token-here"
for i in {1..6}; do
echo "=== Attempt $i ==="
curl -s -w "\nHTTP Status: %{http_code}\n" \
-X POST http://localhost:8080/api/v1/agents/register \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d "{\"hostname\":\"test-$i\",\"os_type\":\"linux\",\"os_version\":\"test\",\"os_architecture\":\"x86_64\",\"agent_version\":\"0.1.17\"}" \
| grep -E "(error|HTTP Status|remaining)"
sleep 1
done
```
**Expected:**
- Requests 1-5: HTTP 200 (or 201)
- Request 6: HTTP 429
**If Request 1 fails:**
- Rate limiter is broken
- OR there's key collision with other endpoints
- OR agent code is making multiple calls internally
## Test 3: Check for Preflight/OPTIONS Requests
```bash
# Enable Gin debug mode to see all requests
docker-compose logs -f server 2>&1 | grep -E "(POST|OPTIONS|GET).*agents/register"
```
Run test 1 in another terminal and watch for:
- Any OPTIONS requests before POST
- Multiple POST requests for a single registration
- Unexpected GET requests
## Test 4: Check Rate Limiter Key Collision
This tests if different endpoints share the same rate limit counter:
```bash
TOKEN="your-token"
IP=$(hostname -I | awk '{print $1}')
echo "Testing from IP: $IP"
# Test download endpoint (public_access)
curl -s -w "\nDownload Status: %{http_code}\n" \
-H "X-Forwarded-For: $IP" \
http://localhost:8080/api/v1/downloads/linux/amd64
sleep 1
# Test install script endpoint (public_access)
curl -s -w "\nInstall Status: %{http_code}\n" \
-H "X-Forwarded-For: $IP" \
http://localhost:8080/api/v1/install/linux
sleep 1
# Now test registration (agent_registration)
curl -s -w "\nRegistration Status: %{http_code}\n" \
-H "X-Forwarded-For: $IP" \
-X POST http://localhost:8080/api/v1/agents/register \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{"hostname":"test","os_type":"linux","os_version":"test","os_architecture":"x86_64","agent_version":"0.1.17"}' \
| grep -E "(Status|error|remaining)"
```
**Theory:** If rate limiters share keys by IP only (not namespaced by limit type), then downloading + install script + registration = 3 requests against a shared 5-request limit, leaving only 2 requests before hitting the limit.
## Test 5: Agent Binary Registration
Test what the actual agent does:
```bash
# Download agent
wget http://localhost:8080/api/v1/downloads/linux/amd64 -O redflag-agent
chmod +x redflag-agent
# Remove any existing config
sudo rm -f /etc/aggregator/config.json
# Enable debug output and register
export DEBUG=1
./redflag-agent --server http://localhost:8080 --token "your-token" --register 2>&1 | tee agent-registration.log
# Check for multiple registration attempts
grep -c "POST.*agents/register" agent-registration.log
```
## Test 6: Server Logs Analysis
Check what the server sees:
```bash
# Clear logs
docker-compose logs --tail=0 -f server > server-logs.txt &
LOG_PID=$!
# Wait a moment
sleep 2
# Make a registration request
curl -X POST http://localhost:8080/api/v1/agents/register \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-token" \
-d '{"hostname":"test","os_type":"linux","os_version":"test","os_architecture":"x86_64","agent_version":"0.1.17"}'
# Wait for logs
sleep 2
kill $LOG_PID
# Analyze
echo "=== All Registration Requests ==="
grep "register" server-logs.txt
echo "=== Rate Limit Events ==="
grep -i "rate\|limit\|429" server-logs.txt
```
## Debugging Checklist
- [ ] Does the FIRST request fail with 429?
- [ ] What's the X-RateLimit-Remaining value on first request?
- [ ] Are there multiple requests happening for a single registration?
- [ ] Do download/install endpoints count against registration limit?
- [ ] Does the agent binary retry internally on failure?
- [ ] Are there preflight OPTIONS requests?
- [ ] What's the rate limit key being used (check logs)?
## Potential Root Causes
1. **Key Namespace Bug**: Rate limiter keys aren't namespaced by limit type
- Fix: Prepend limitType to key (e.g., "agent_registration:127.0.0.1")
2. **Agent Retry Logic**: Agent retries registration on first failure
- Fix: Check agent registration code for retry loops
3. **Shared Counter**: Download + Install + Register share same counter
- Fix: Namespace keys or use different key functions
4. **Off-by-One**: Rate limiter logic checks `>=` instead of `>`
- Fix: Change condition in checkRateLimit()
5. **Preflight Requests**: Browser/client making OPTIONS requests
- Fix: Exclude OPTIONS from rate limiting
## Expected Fix
Most likely: Rate limiter keys need namespacing.
Current (broken):
```go
key := keyFunc(c) // Just "127.0.0.1"
allowed, resetTime := rl.checkRateLimit(key, config)
```
Fixed:
```go
key := keyFunc(c)
namespacedKey := limitType + ":" + key // "agent_registration:127.0.0.1"
allowed, resetTime := rl.checkRateLimit(namespacedKey, config)
```
This ensures agent_registration, public_access, and agent_reports each get their own counters per IP.