Redflag/docs/1_ETHOS/ETHOS.md

# RedFlag Development Ethos

**Philosophy**: We are building honest, autonomous software for a community that values digital sovereignty. This isn't enterprise-fluff; it's a "less is more" set of non-negotiable principles forged from experience. We ship bugs, but we are honest about them, and we log the failures.

---

## The Core Ethos (Non-Negotiable Principles)

These are the rules we've learned not to compromise on. They are the foundation of our development contract.

### 1. Errors are History, Not /dev/null

**Principle**: NEVER silence errors.

**Rationale**: A "laid back" admin is one who can sleep at night, knowing any failure will be in the logs. We don't use 2>/dev/null. We fix the root cause, not the symptom.

**Implementation Contract**:
- All errors, from a script exit 1 to an API 500, MUST be captured and logged with context (what failed, why, what was attempted)
- All logs MUST follow the `[TAG] [system] [component]` format (e.g., `[ERROR] [agent] [installer] Download failed...`)
- The final destination for all auditable events (errors and state changes) is the history table

### 2. Security is Non-Negotiable

**Principle**: NEVER add unauthenticated endpoints.

**Rationale**: "Temporary" is permanent. Every single route MUST be protected by the established, multi-subsystem security architecture.

**Security Stack**:
- **User Auth (WebUI)**: All admin dashboard routes MUST be protected by WebAuthMiddleware()
- **Agent Registration**: Agents can only be created using valid registration_token via `/api/v1/agents/register`
- **Agent Check-in**: All agent-to-server communication MUST be protected by AuthMiddleware() validating JWT access tokens
- **Agent Token Renewal**: Agents MUST only renew tokens using their long-lived refresh_token via `/api/v1/agents/renew`
- **Hardware Verification**: All authenticated agent routes MUST be protected by MachineBindingMiddleware to validate X-Machine-ID header
- **Update Security**: Sensitive commands MUST be protected by signed Ed25519 Nonce to prevent replay attacks
- **Binary Security**: Agents MUST verify Ed25519 signatures of downloaded binaries against cached server public key (TOFU model)

### 3. Assume Failure; Build for Resilience

**Principle**: NEVER assume an operation will succeed.

**Rationale**: Networks fail. Servers restart. Agents crash. The system must recover without manual intervention.

**Resilience Contract**:
- **Agent Network**: Agent check-ins MUST use retry logic with exponential backoff to survive server 502s and transient failures
- **Scanner Reliability**: Long-running or fragile scanners (Windows Update, DNF) MUST be wrapped in Circuit Breaker to prevent subsystem blocking
- **Data Delivery**: Command results MUST use Command Acknowledgment System (`pending_acks.json`) for at-least-once delivery guarantees

### 4. Idempotency is a Requirement

**Principle**: NEVER forget idempotency.

**Rationale**: We (and our agents) will inevitably run the same command twice. The system must not break or create duplicate state.

**Idempotency Contract**:
- **Install Scripts**: Must be idempotent, checking if agent/service is already installed before attempting installation
- **Command Design**: All commands should be designed for idempotency to prevent duplicate state issues
- **Database Migrations**: All schema changes MUST be idempotent (CREATE TABLE IF NOT EXISTS, ADD COLUMN IF NOT EXISTS, etc.)

### 5. No Marketing Fluff (The "No BS" Rule)

**Principle**: NEVER use banned words or emojis in logs or code.

**Rationale**: We are building an "honest" tool for technical users, not pitching a product. Fluff hides meaning and creates enterprise BS.

**Clarity Contract**:
- **Banned Words**: enhanced, enterprise-ready, seamless, robust, production-ready, revolutionary, etc.
- **Banned Emojis**: Emojis like ⚠️, ✅, ❌ are for UI/communications, not for logs
- **Logging Format**: All logs MUST use the `[TAG] [system] [component]` format for clarity and consistency

---

## Critical Build Practices (Non-Negotiable)

### Docker Cache Invalidation During Testing

**Principle**: ALWAYS use `--no-cache` when testing fixes.

**Rationale**: Docker layer caching will use the broken state unless explicitly invalidated. A fix that appears to fail may simply be using cached layers.

**Build Contract**:
- **Testing Fixes**: `docker-compose build --no-cache` or `docker build --no-cache`
- **Never Assume**: Cache will not pick up source code changes automatically
- **Verification**: If a fix doesn't work, rebuild without cache before debugging further

---

## Development Workflow Principles

### Session-Based Development

Development sessions follow a structured pattern to maintain quality and documentation:

**Before Starting**:
1. Review current project status and priorities
2. Read previous session documentation for context
3. Set clear, specific goals for the session
4. Create todo list to track progress

**During Development**:
1. Implement code following established patterns
2. Document progress as you work (don't wait until end)
3. Update todo list continuously
4. Test functionality as you build

**After Session Completion**:
1. Create session documentation with complete technical details
2. Update status files with new capabilities and technical debt
3. Clean up todo list and plan next session priorities
4. Verify all quality checkpoints are met

### Quality Standards

**Code Quality**:
- Follow language best practices (Go, TypeScript, React)
- Include proper error handling for all failure scenarios
- Add meaningful comments for complex logic
- Maintain consistent formatting and style

**Documentation Quality**:
- Be accurate and specific with technical details
- Include file paths, line numbers, and code snippets
- Document the "why" behind technical decisions
- Focus on outcomes and user impact

**Testing Quality**:
- Test core functionality and error scenarios
- Verify integration points work correctly
- Validate user workflows end-to-end
- Document test results and known issues

---

## The Pre-Integration Checklist

**Do not merge or consider work complete until you can check these boxes**:

- [ ] All errors are logged (not silenced with `/dev/null`)
- [ ] No new unauthenticated endpoints exist (all use proper middleware)
- [ ] Backup/restore/fallback paths exist for critical operations
- [ ] Idempotency verified (can run 3x safely)
- [ ] History table logging added for all state changes
- [ ] Security review completed (respects the established stack)
- [ ] Testing includes error scenarios (not just happy path)
- [ ] Documentation is updated with current implementation details
- [ ] Technical debt is identified and tracked

---

## Sustainable Development Practices

### Technical Debt Management

**Every session must identify and document**:
1. **New Technical Debt**: What shortcuts were taken and why
2. **Deferred Features**: What was postponed and the justification
3. **Known Issues**: Problems discovered but not fixed
4. **Architecture Decisions**: Technical choices needing future review

### Self-Enforcement Mechanisms

**Pattern Discipline**:
- Use TodoWrite tool for session progress tracking
- Create session documentation for ALL development work
- Update status files to reflect current reality
- Maintain context across development sessions

**Anti-Patterns to Avoid**:
❌ "I'll document it later" - Details will be lost
❌ "This session was too small to document" - All sessions matter
❌ "The technical debt isn't important enough to track" - It will become critical
❌ "I'll remember this decision" - You won't, document it

**Positive Patterns to Follow**:
✅ Document as you go - Take notes during implementation
✅ End each session with documentation - Make it part of completion criteria
✅ Track all decisions - Even small choices have future impact
✅ Maintain technical debt visibility - Hidden debt becomes project risk

This ethos ensures consistent, high-quality development while building a maintainable system that serves both current users and future development needs. **The principles only work when consistently followed.**