Complete RedFlag codebase with two major security audit implementations.
== A-1: Ed25519 Key Rotation Support ==
Server:
- SignCommand sets SignedAt timestamp and KeyID on every signature
- signing_keys database table (migration 020) for multi-key rotation
- InitializePrimaryKey registers active key at startup
- /api/v1/public-keys endpoint for rotation-aware agents
- SigningKeyQueries for key lifecycle management
Agent:
- Key-ID-aware verification via CheckKeyRotation
- FetchAndCacheAllActiveKeys for rotation pre-caching
- Cache metadata with TTL and staleness fallback
- SecurityLogger events for key rotation and command signing
== A-2: Replay Attack Fixes (F-1 through F-7) ==
F-5 CRITICAL - RetryCommand now signs via signAndCreateCommand
F-1 HIGH - v3 format: "{agent_id}:{cmd_id}:{type}:{hash}:{ts}"
F-7 HIGH - Migration 026: expires_at column with partial index
F-6 HIGH - GetPendingCommands/GetStuckCommands filter by expires_at
F-2 HIGH - Agent-side executedIDs dedup map with cleanup
F-4 HIGH - commandMaxAge reduced from 24h to 4h
F-3 CRITICAL - Old-format commands rejected after 48h via CreatedAt
Verification fixes: migration idempotency (ETHOS #4), log format
compliance (ETHOS #1), stale comments updated.
All 24 tests passing. Docker --no-cache build verified.
See docs/ for full audit reports and deviation log (DEV-001 to DEV-019).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8.0 KiB
RedFlag Development Ethos
Philosophy: We are building honest, autonomous software for a community that values digital sovereignty. This isn't enterprise-fluff; it's a "less is more" set of non-negotiable principles forged from experience. We ship bugs, but we are honest about them, and we log the failures.
The Core Ethos (Non-Negotiable Principles)
These are the rules we've learned not to compromise on. They are the foundation of our development contract.
1. Errors are History, Not /dev/null
Principle: NEVER silence errors.
Rationale: A "laid back" admin is one who can sleep at night, knowing any failure will be in the logs. We don't use 2>/dev/null. We fix the root cause, not the symptom.
Implementation Contract:
- All errors, from a script exit 1 to an API 500, MUST be captured and logged with context (what failed, why, what was attempted)
- All logs MUST follow the
[TAG] [system] [component]format (e.g.,[ERROR] [agent] [installer] Download failed...) - The final destination for all auditable events (errors and state changes) is the history table
2. Security is Non-Negotiable
Principle: NEVER add unauthenticated endpoints.
Rationale: "Temporary" is permanent. Every single route MUST be protected by the established, multi-subsystem security architecture.
Security Stack:
- User Auth (WebUI): All admin dashboard routes MUST be protected by WebAuthMiddleware()
- Agent Registration: Agents can only be created using valid registration_token via
/api/v1/agents/register - Agent Check-in: All agent-to-server communication MUST be protected by AuthMiddleware() validating JWT access tokens
- Agent Token Renewal: Agents MUST only renew tokens using their long-lived refresh_token via
/api/v1/agents/renew - Hardware Verification: All authenticated agent routes MUST be protected by MachineBindingMiddleware to validate X-Machine-ID header
- Update Security: Sensitive commands MUST be protected by signed Ed25519 Nonce to prevent replay attacks
- Binary Security: Agents MUST verify Ed25519 signatures of downloaded binaries against cached server public key (TOFU model)
3. Assume Failure; Build for Resilience
Principle: NEVER assume an operation will succeed.
Rationale: Networks fail. Servers restart. Agents crash. The system must recover without manual intervention.
Resilience Contract:
- Agent Network: Agent check-ins MUST use retry logic with exponential backoff to survive server 502s and transient failures
- Scanner Reliability: Long-running or fragile scanners (Windows Update, DNF) MUST be wrapped in Circuit Breaker to prevent subsystem blocking
- Data Delivery: Command results MUST use Command Acknowledgment System (
pending_acks.json) for at-least-once delivery guarantees
4. Idempotency is a Requirement
Principle: NEVER forget idempotency.
Rationale: We (and our agents) will inevitably run the same command twice. The system must not break or create duplicate state.
Idempotency Contract:
- Install Scripts: Must be idempotent, checking if agent/service is already installed before attempting installation
- Command Design: All commands should be designed for idempotency to prevent duplicate state issues
- Database Migrations: All schema changes MUST be idempotent (CREATE TABLE IF NOT EXISTS, ADD COLUMN IF NOT EXISTS, etc.)
5. No Marketing Fluff (The "No BS" Rule)
Principle: NEVER use banned words or emojis in logs or code.
Rationale: We are building an "honest" tool for technical users, not pitching a product. Fluff hides meaning and creates enterprise BS.
Clarity Contract:
- Banned Words: enhanced, enterprise-ready, seamless, robust, production-ready, revolutionary, etc.
- Banned Emojis: Emojis like ⚠️, ✅, ❌ are for UI/communications, not for logs
- Logging Format: All logs MUST use the
[TAG] [system] [component]format for clarity and consistency
Critical Build Practices (Non-Negotiable)
Docker Cache Invalidation During Testing
Principle: ALWAYS use --no-cache when testing fixes.
Rationale: Docker layer caching will use the broken state unless explicitly invalidated. A fix that appears to fail may simply be using cached layers.
Build Contract:
- Testing Fixes:
docker-compose build --no-cacheordocker build --no-cache - Never Assume: Cache will not pick up source code changes automatically
- Verification: If a fix doesn't work, rebuild without cache before debugging further
Development Workflow Principles
Session-Based Development
Development sessions follow a structured pattern to maintain quality and documentation:
Before Starting:
- Review current project status and priorities
- Read previous session documentation for context
- Set clear, specific goals for the session
- Create todo list to track progress
During Development:
- Implement code following established patterns
- Document progress as you work (don't wait until end)
- Update todo list continuously
- Test functionality as you build
After Session Completion:
- Create session documentation with complete technical details
- Update status files with new capabilities and technical debt
- Clean up todo list and plan next session priorities
- Verify all quality checkpoints are met
Quality Standards
Code Quality:
- Follow language best practices (Go, TypeScript, React)
- Include proper error handling for all failure scenarios
- Add meaningful comments for complex logic
- Maintain consistent formatting and style
Documentation Quality:
- Be accurate and specific with technical details
- Include file paths, line numbers, and code snippets
- Document the "why" behind technical decisions
- Focus on outcomes and user impact
Testing Quality:
- Test core functionality and error scenarios
- Verify integration points work correctly
- Validate user workflows end-to-end
- Document test results and known issues
The Pre-Integration Checklist
Do not merge or consider work complete until you can check these boxes:
- All errors are logged (not silenced with
/dev/null) - No new unauthenticated endpoints exist (all use proper middleware)
- Backup/restore/fallback paths exist for critical operations
- Idempotency verified (can run 3x safely)
- History table logging added for all state changes
- Security review completed (respects the established stack)
- Testing includes error scenarios (not just happy path)
- Documentation is updated with current implementation details
- Technical debt is identified and tracked
Sustainable Development Practices
Technical Debt Management
Every session must identify and document:
- New Technical Debt: What shortcuts were taken and why
- Deferred Features: What was postponed and the justification
- Known Issues: Problems discovered but not fixed
- Architecture Decisions: Technical choices needing future review
Self-Enforcement Mechanisms
Pattern Discipline:
- Use TodoWrite tool for session progress tracking
- Create session documentation for ALL development work
- Update status files to reflect current reality
- Maintain context across development sessions
Anti-Patterns to Avoid: ❌ "I'll document it later" - Details will be lost ❌ "This session was too small to document" - All sessions matter ❌ "The technical debt isn't important enough to track" - It will become critical ❌ "I'll remember this decision" - You won't, document it
Positive Patterns to Follow: ✅ Document as you go - Take notes during implementation ✅ End each session with documentation - Make it part of completion criteria ✅ Track all decisions - Even small choices have future impact ✅ Maintain technical debt visibility - Hidden debt becomes project risk
This ethos ensures consistent, high-quality development while building a maintainable system that serves both current users and future development needs. The principles only work when consistently followed.