8.3 KiB
8.3 KiB
P0-004: Database Constraint Violation in Timeout Log Creation
Priority: P0 (Critical) Source Reference: From needsfixingbeforepush.md line 313 Date Identified: 2025-11-12
Problem Description
Timeout service successfully marks commands as timed_out but fails to create audit log entries in the update_logs table due to a database constraint violation. The error "pq: new row for relation "update_logs" violates check constraint "update_logs_result_check"" prevents proper audit trail creation for timeout events.
Current Behavior
- Timeout service runs every 5 minutes correctly
- Successfully identifies timed out commands (both pending >30min and sent >2h)
- Successfully updates command status to 'timed_out' in
agent_commandstable - FAILS to create audit log entry in
update_logstable - Constraint violation suggests 'timed_out' is not a valid value for the
resultfield
Error Message
Warning: failed to create timeout log entry: pq: new row for relation "update_logs" violates check constraint "update_logs_result_check"
Root Cause Analysis
The update_logs table has a CHECK constraint on the result field that doesn't include 'timed_out' as a valid value. The timeout service is trying to insert 'timed_out' as the result, but the database schema only accepts other values like 'success', 'failed', 'error', etc.
Likely Database Schema Issue
-- Current constraint (hypothetical)
ALTER TABLE update_logs ADD CONSTRAINT update_logs_result_check
CHECK (result IN ('success', 'failed', 'error', 'pending'));
-- Missing: 'timed_out' in the allowed values list
Proposed Solution
Option 1: Add 'timed_out' to Database Constraint (Recommended)
-- Update the check constraint to include 'timed_out'
ALTER TABLE update_logs DROP CONSTRAINT update_logs_result_check;
ALTER TABLE update_logs ADD CONSTRAINT update_logs_result_check
CHECK (result IN ('success', 'failed', 'error', 'pending', 'timed_out'));
Option 2: Use 'failed' with Timeout Metadata
// In timeout service, use 'failed' instead of 'timed_out'
logEntry := &UpdateLog{
CommandID: command.ID,
AgentID: command.AgentID,
Result: "failed", // Instead of "timed_out"
Message: "Command timed out after 2 hours",
Metadata: map[string]interface{}{
"timeout_duration": "2h",
"timeout_reason": "no_response",
"sent_at": command.SentAt,
},
}
Option 3: Separate Timeout Status Field
-- Add dedicated timeout tracking
ALTER TABLE update_logs ADD COLUMN is_timed_out BOOLEAN DEFAULT FALSE;
ALTER TABLE update_logs ADD COLUMN timeout_duration INTERVAL;
-- Keep result as 'failed' but mark as timeout
UPDATE update_logs SET
result = 'failed',
is_timed_out = TRUE,
timeout_duration = '2 hours'
WHERE command_id = '...';
Definition of Done
- Timeout service can create audit log entries without constraint violations
- Audit trail properly records timeout events with timestamps and details
- Timeout events are visible in command history and audit reports
- Database constraint allows all valid command result states
- Error logs no longer show constraint violation warnings
- Compliance requirements for audit trail are met
Test Plan
1. Manual Timeout Creation Test
# Create a command and mark it as sent
docker exec -it redflag-postgres psql -U aggregator -d aggregator -c "
INSERT INTO agent_commands (id, agent_id, command_type, status, created_at, sent_at)
VALUES ('test-timeout-123', 'agent-uuid', 'scan_updates', 'sent', NOW(), NOW() - INTERVAL '3 hours');
"
# Run timeout service manually or wait for next run (5 minutes)
# Check that no constraint violation occurs
docker logs redflag-server | grep -i "constraint\|timeout"
# Verify audit log was created
docker exec -it redflag-postgres psql -U aggregator -d aggregator -c "
SELECT * FROM update_logs WHERE command_id = 'test-timeout-123';
"
2. Database Constraint Test
# Test all valid result values
docker exec -it redflag-postgres psql -U aggregator -d aggregator -c "
INSERT INTO update_logs (command_id, agent_id, result, message)
VALUES
('test-success', 'agent-uuid', 'success', 'Test success'),
('test-failed', 'agent-uuid', 'failed', 'Test failed'),
('test-error', 'agent-uuid', 'error', 'Test error'),
('test-pending', 'agent-uuid', 'pending', 'Test pending'),
('test-timeout', 'agent-uuid', 'timed_out', 'Test timeout');
"
# All should succeed without constraint violations
3. Full Timeout Service Test
# Set up old commands that should timeout
docker exec -it redflag-postgres psql -U aggregator -d aggregator -c "
UPDATE agent_commands
SET status = 'sent', sent_at = NOW() - INTERVAL '3 hours'
WHERE created_at < NOW() - INTERVAL '1 hour';
"
# Trigger timeout service
curl -X POST http://localhost:8080/api/v1/admin/timeout-service/run \
-H "Authorization: Bearer $ADMIN_TOKEN"
# Verify no constraint violations in logs
# Verify audit logs are created for timed out commands
4. Audit Trail Verification
# Check that timeout events appear in command history
curl -H "Authorization: Bearer $TOKEN" \
"http://localhost:8080/api/v1/commands/history?include_timeout=true"
# Should show timeout events with proper metadata
Files to Modify
- Database Migration:
aggregator-server/internal/database/migrations/XXX_add_timed_out_constraint.up.sql - Timeout Service:
aggregator-server/internal/services/timeout.go - Database Schema: Update
update_logstable constraints - API Handlers: Ensure timeout events are returned in history queries
Database Migration Example
-- File: 020_add_timed_out_to_result_constraint.up.sql
-- Add 'timed_out' as valid result value for update_logs
-- First, drop existing constraint
ALTER TABLE update_logs DROP CONSTRAINT IF EXISTS update_logs_result_check;
-- Add updated constraint with 'timed_out' included
ALTER TABLE update_logs ADD CONSTRAINT update_logs_result_check
CHECK (result IN ('success', 'failed', 'error', 'pending', 'timed_out'));
-- Add comment explaining the change
COMMENT ON CONSTRAINT update_logs_result_check ON update_logs IS
'Valid result values for command execution, including timeout status';
Impact
- Audit Compliance: Enables complete audit trail for timeout events
- Troubleshooting: Timeout events visible in command history and logs
- Compliance: Meets regulatory requirements for complete audit trail
- Debugging: Clear visibility into timeout patterns and system health
- Monitoring: Enables metrics on timeout rates and patterns
Security and Compliance Considerations
Audit Trail Requirements
- Complete Records: All command state changes must be logged
- Immutable History: Timeout events should not be deletable
- Timestamp Accuracy: Precise timing of timeout detection
- User Attribution: Which system/service detected the timeout
Data Privacy
- Command Details: What command timed out (but not sensitive data)
- Agent Information: Which agent had the timeout
- Timing Data: How long the command was stuck
- System Metadata: Service version, detection method
Monitoring and Alerting
Metrics to Track
- Timeout rate by command type
- Average timeout duration
- Timeout service execution success rate
- Audit log creation success rate
- Database constraint violations (should be 0)
Alert Examples
# Alert if timeout service fails
if timeout_service_failures > 3 in 5m:
alert("Timeout service experiencing failures")
# Alert if constraint violations occur
if database_constraint_violations > 0:
critical("Database constraint violation detected!")
Verification Commands
After fix implementation:
# Test timeout service execution
curl -X POST http://localhost:8080/api/v1/admin/timeout-service/run \
-H "Authorization: Bearer $ADMIN_TOKEN"
# Check for constraint violations
docker logs redflag-server | grep -i "constraint" # Should be empty
# Verify audit log creation
docker exec -it redflag-postgres psql -U aggregator -d aggregator -c "
SELECT COUNT(*) FROM update_logs
WHERE result = 'timed_out'
AND created_at > NOW() - INTERVAL '1 hour';
"
# Should be >0 after timeout service runs
# Verify no constraint errors
docker logs redflag-server 2>&1 | grep -c "violates check constraint"
# Should return 0