# P0-004: Database Constraint Violation in Timeout Log Creation **Priority:** P0 (Critical) **Source Reference:** From needsfixingbeforepush.md line 313 **Date Identified:** 2025-11-12 ## Problem Description Timeout service successfully marks commands as timed_out but fails to create audit log entries in the `update_logs` table due to a database constraint violation. The error "pq: new row for relation "update_logs" violates check constraint "update_logs_result_check"" prevents proper audit trail creation for timeout events. ## Current Behavior - Timeout service runs every 5 minutes correctly - Successfully identifies timed out commands (both pending >30min and sent >2h) - Successfully updates command status to 'timed_out' in `agent_commands` table - **FAILS** to create audit log entry in `update_logs` table - Constraint violation suggests 'timed_out' is not a valid value for the `result` field ### Error Message ``` Warning: failed to create timeout log entry: pq: new row for relation "update_logs" violates check constraint "update_logs_result_check" ``` ## Root Cause Analysis The `update_logs` table has a CHECK constraint on the `result` field that doesn't include 'timed_out' as a valid value. The timeout service is trying to insert 'timed_out' as the result, but the database schema only accepts other values like 'success', 'failed', 'error', etc. ### Likely Database Schema Issue ```sql -- Current constraint (hypothetical) ALTER TABLE update_logs ADD CONSTRAINT update_logs_result_check CHECK (result IN ('success', 'failed', 'error', 'pending')); -- Missing: 'timed_out' in the allowed values list ``` ## Proposed Solution ### Option 1: Add 'timed_out' to Database Constraint (Recommended) ```sql -- Update the check constraint to include 'timed_out' ALTER TABLE update_logs DROP CONSTRAINT update_logs_result_check; ALTER TABLE update_logs ADD CONSTRAINT update_logs_result_check CHECK (result IN ('success', 'failed', 'error', 'pending', 'timed_out')); ``` ### Option 2: Use 'failed' with Timeout Metadata ```go // In timeout service, use 'failed' instead of 'timed_out' logEntry := &UpdateLog{ CommandID: command.ID, AgentID: command.AgentID, Result: "failed", // Instead of "timed_out" Message: "Command timed out after 2 hours", Metadata: map[string]interface{}{ "timeout_duration": "2h", "timeout_reason": "no_response", "sent_at": command.SentAt, }, } ``` ### Option 3: Separate Timeout Status Field ```sql -- Add dedicated timeout tracking ALTER TABLE update_logs ADD COLUMN is_timed_out BOOLEAN DEFAULT FALSE; ALTER TABLE update_logs ADD COLUMN timeout_duration INTERVAL; -- Keep result as 'failed' but mark as timeout UPDATE update_logs SET result = 'failed', is_timed_out = TRUE, timeout_duration = '2 hours' WHERE command_id = '...'; ``` ## Definition of Done - [ ] Timeout service can create audit log entries without constraint violations - [ ] Audit trail properly records timeout events with timestamps and details - [ ] Timeout events are visible in command history and audit reports - [ ] Database constraint allows all valid command result states - [ ] Error logs no longer show constraint violation warnings - [ ] Compliance requirements for audit trail are met ## Test Plan ### 1. Manual Timeout Creation Test ```bash # Create a command and mark it as sent docker exec -it redflag-postgres psql -U aggregator -d aggregator -c " INSERT INTO agent_commands (id, agent_id, command_type, status, created_at, sent_at) VALUES ('test-timeout-123', 'agent-uuid', 'scan_updates', 'sent', NOW(), NOW() - INTERVAL '3 hours'); " # Run timeout service manually or wait for next run (5 minutes) # Check that no constraint violation occurs docker logs redflag-server | grep -i "constraint\|timeout" # Verify audit log was created docker exec -it redflag-postgres psql -U aggregator -d aggregator -c " SELECT * FROM update_logs WHERE command_id = 'test-timeout-123'; " ``` ### 2. Database Constraint Test ```bash # Test all valid result values docker exec -it redflag-postgres psql -U aggregator -d aggregator -c " INSERT INTO update_logs (command_id, agent_id, result, message) VALUES ('test-success', 'agent-uuid', 'success', 'Test success'), ('test-failed', 'agent-uuid', 'failed', 'Test failed'), ('test-error', 'agent-uuid', 'error', 'Test error'), ('test-pending', 'agent-uuid', 'pending', 'Test pending'), ('test-timeout', 'agent-uuid', 'timed_out', 'Test timeout'); " # All should succeed without constraint violations ``` ### 3. Full Timeout Service Test ```bash # Set up old commands that should timeout docker exec -it redflag-postgres psql -U aggregator -d aggregator -c " UPDATE agent_commands SET status = 'sent', sent_at = NOW() - INTERVAL '3 hours' WHERE created_at < NOW() - INTERVAL '1 hour'; " # Trigger timeout service curl -X POST http://localhost:8080/api/v1/admin/timeout-service/run \ -H "Authorization: Bearer $ADMIN_TOKEN" # Verify no constraint violations in logs # Verify audit logs are created for timed out commands ``` ### 4. Audit Trail Verification ```bash # Check that timeout events appear in command history curl -H "Authorization: Bearer $TOKEN" \ "http://localhost:8080/api/v1/commands/history?include_timeout=true" # Should show timeout events with proper metadata ``` ## Files to Modify - **Database Migration:** `aggregator-server/internal/database/migrations/XXX_add_timed_out_constraint.up.sql` - **Timeout Service:** `aggregator-server/internal/services/timeout.go` - **Database Schema:** Update `update_logs` table constraints - **API Handlers:** Ensure timeout events are returned in history queries ## Database Migration Example ```sql -- File: 020_add_timed_out_to_result_constraint.up.sql -- Add 'timed_out' as valid result value for update_logs -- First, drop existing constraint ALTER TABLE update_logs DROP CONSTRAINT IF EXISTS update_logs_result_check; -- Add updated constraint with 'timed_out' included ALTER TABLE update_logs ADD CONSTRAINT update_logs_result_check CHECK (result IN ('success', 'failed', 'error', 'pending', 'timed_out')); -- Add comment explaining the change COMMENT ON CONSTRAINT update_logs_result_check ON update_logs IS 'Valid result values for command execution, including timeout status'; ``` ## Impact - **Audit Compliance:** Enables complete audit trail for timeout events - **Troubleshooting:** Timeout events visible in command history and logs - **Compliance:** Meets regulatory requirements for complete audit trail - **Debugging:** Clear visibility into timeout patterns and system health - **Monitoring:** Enables metrics on timeout rates and patterns ## Security and Compliance Considerations ### Audit Trail Requirements - **Complete Records:** All command state changes must be logged - **Immutable History:** Timeout events should not be deletable - **Timestamp Accuracy:** Precise timing of timeout detection - **User Attribution:** Which system/service detected the timeout ### Data Privacy - **Command Details:** What command timed out (but not sensitive data) - **Agent Information:** Which agent had the timeout - **Timing Data:** How long the command was stuck - **System Metadata:** Service version, detection method ## Monitoring and Alerting ### Metrics to Track - Timeout rate by command type - Average timeout duration - Timeout service execution success rate - Audit log creation success rate - Database constraint violations (should be 0) ### Alert Examples ```bash # Alert if timeout service fails if timeout_service_failures > 3 in 5m: alert("Timeout service experiencing failures") # Alert if constraint violations occur if database_constraint_violations > 0: critical("Database constraint violation detected!") ``` ## Verification Commands After fix implementation: ```bash # Test timeout service execution curl -X POST http://localhost:8080/api/v1/admin/timeout-service/run \ -H "Authorization: Bearer $ADMIN_TOKEN" # Check for constraint violations docker logs redflag-server | grep -i "constraint" # Should be empty # Verify audit log creation docker exec -it redflag-postgres psql -U aggregator -d aggregator -c " SELECT COUNT(*) FROM update_logs WHERE result = 'timed_out' AND created_at > NOW() - INTERVAL '1 hour'; " # Should be >0 after timeout service runs # Verify no constraint errors docker logs redflag-server 2>&1 | grep -c "violates check constraint" # Should return 0 ```