Problem: Version check middleware blocked old agents from checking in to receive
update commands, creating a deadlock where agents couldn't upgrade because they
were blocked from checking in.
Solution: Modified MachineBindingMiddleware to allow old agents checking in for
commands to proceed IF they have a pending update_agent command. This allows
agents to receive the update command even when below minimum version.
Changes:
- Added grace period logic in middleware for command endpoints
- Check if agent has pending update command before blocking
- If update pending, allow check-in and log it
- Added HasPendingUpdateCommand() to AgentQueries for checking pending updates
- Also added same method to CommandQueries for completeness
This prevents the version tracking deadlock while maintaining security for
agents without pending updates.
NOTE: Need to test that old agents can actually receive and execute update
commands when allowed through this path.
Critical regression fix - subsystems were hardcoded instead of reading user settings.
Added CreateDefaultSubsystems to queries/subsystems.go.
ConfigService now queries agent_subsystems table for actual user configuration.
AgentLifecycleService creates default subsystems when creating new agents.
Respects user-configured enabled/auto_run settings from UI.
Created version package for semantic version comparison.
Fixed GetLatestVersionByTypeAndArch to use combined platform format.
Replaced inline version comparison with reusable version.Compare().
- Add GET /api/v1/agents/:id/config endpoint for server configuration
- Agent fetches config during check-in and applies updates
- Add version tracking to prevent unnecessary config applications
- Clean separation: config sync independent of commands
- Fix agent UI subsystem settings to actually control agent behavior
- Update Security Health UI with frosted glass styling and tooltips
- Create separate scanner interfaces for storage, system, and docker data
- Add dedicated endpoints for metrics and docker images instead of misclassifying as updates
- Implement proper database tables for storage metrics and docker images
- Fix storage/system metrics appearing incorrectly as package updates
- Add scanner types with proper data structures for each subsystem
- Update agent handlers to use correct endpoints for each data type
- Fix recursive call in reportLogWithAck that caused infinite loop
- Add machine binding and security API endpoints
- Enhance AgentScanners component with security status display
- Update scheduler and timeout service reliability
- Remove deprecated install.sh script
- Add subsystem configuration and logging improvements
Split monolithic scan_updates into individual subsystems (updates/storage/system/docker).
Scanners now run in parallel via goroutines - cuts scan time roughly in half, maybe more.
Agent changes:
- Orchestrator pattern for scanner management
- New scanners: storage (disk metrics), system (cpu/mem/processes)
- New commands: scan_storage, scan_system, scan_docker
- Wrapped existing scanners (APT/DNF/Docker/Windows/Winget) with common interface
- Version bump to 0.1.20
Server changes:
- Migration 015: agent_subsystems table with trigger for auto-init
- Subsystem CRUD: enable/disable, interval (5min-24hr), auto-run toggle
- API routes: /api/v1/agents/:id/subsystems/* (9 endpoints)
- Stats tracking per subsystem
Web UI changes:
- ChatTimeline shows subsystem-specific labels and icons
- AgentScanners got interactive toggles, interval dropdowns, manual trigger buttons
- TypeScript types added for subsystems
Backward compatible with legacy scan_updates - for now. Bugs probably exist somewhere.
Added circuit breakers with configurable timeouts for all subsystems (APT, DNF, Docker, Windows, Winget, Storage). Replaces cron-based scheduler with priority queue that should scale beyond 1000+ agents if your homelab is that big.
Command acknowledgment system ensures results aren't lost on network failures or restarts. Agent tracks pending acknowledgments with persistent state and automatic retry.
- Circuit breakers: 3 failures in 1min opens circuit, 30s cooldown
- Per-subsystem timeouts: 30s-10min depending on scanner
- Priority queue scheduler: O(log n), worker pool, jitter, backpressure
- Acknowledgments: at-least-once delivery, max 10 retries over 24h
- All tests passing (26/26)
- Redesign AgentUpdatesEnhanced with tab-based workflow (pending/approved/installing/installed)
- Add AgentStorage component with disk partition table
- Add AgentScanners component for agent health monitoring
- Fix agent removal not refreshing list (cache invalidation)
- Bump agent version to 0.1.18 (enhanced disk detection)
- Update server default version to 0.1.18
- Add command source tracking (system/manual) migration
- Improve Linux disk detection for all physical mount points
The reboot_reason field was defined as string instead of *string, causing
database scan failures when the column contains NULL values. This broke
agent list loading on existing installations after migration.
- Changed reboot_reason to *string in both Agent and AgentWithLastScan structs
- Added DEFAULT empty string to migration for new installations
- Added README section for full server reinstall procedure
Potential fixes for issues #4 and #6.
Agent version display:
- Set CurrentVersion during registration instead of waiting for first check-in
- Update UI to show "Initial Registration" instead of "Unknown"
Host restart detection:
- Added reboot_required, last_reboot_at, reboot_reason fields to agents table
- Agent now detects pending reboots (Debian/Ubuntu via /var/run/reboot-required, RHEL/Fedora via needs-restarting)
- New reboot command type with 1-minute grace period
- UI shows restart alerts and adds restart button in quick actions
- Restart indicator badge in agent list
The reboot detection runs during system info collection and gets reported back to the server automatically.
Using shutdown command for now until we make the restart mechanism user-adjustable later - need to think on that.
Also need to come up with a Windows derivative outside of reading event log for detecting reboots.
- Update go.mod files to use github.com/Fimeg/RedFlag module path
- Fix all import statements across server and agent code
- Resolves build errors when cloning from GitHub
- Utils package (version comparison) is actually needed and working
Breaking changes for clean alpha releases:
- JWT authentication with user-provided secrets (no more development defaults)
- Registration token system for secure agent enrollment
- Rate limiting with user-adjustable settings
- Enhanced agent configuration with proxy support
- Interactive server setup wizard (--setup flag)
- Heartbeat architecture separation for better UX
- Package status synchronization fixes
- Accurate timestamp tracking for RMM features
Setup process for new installations:
1. docker-compose up -d postgres
2. ./redflag-server --setup
3. ./redflag-server --migrate
4. ./redflag-server
5. Generate tokens via admin UI
6. Deploy agents with registration tokens
- Cross-platform support (Windows/Linux) with Windows Updates and Winget
- Added dependency confirmation workflow and refresh token authentication
- New screenshots: History, Live Operations, Windows Agent Details
- Local CLI features with terminal output and cache system
- Updated known limitations - Proxmox integration is broken
- Organized docs to docs/ folder and updated .gitignore
- Probably introduced a dozen bugs with Windows agents - stay tuned
- Fixed gitignore to allow Screenshots/*.png files
- Added all screenshots for README documentation
- Fixed gitignore to be less restrictive with image files
- Includes dashboard, agent, updates, and docker screenshots
Major milestone: Update installation system now works
- Implemented unified installer interface with factory pattern
- Created APT, DNF, and Docker installers
- Integrated installer into agent command processing loop
- Update approval button now actually installs packages
Documentation updates:
- Updated claude.md with Session 7 implementation log
- Created clean, professional README.md for GitHub
- Added screenshots section with 4 dashboard views
- Preserved detailed development history in backup files
Repository ready for GitHub alpha release with working installer functionality.
🚩 Private development - version retention only
✅ Complete web dashboard (React + TypeScript + TailwindCSS)
✅ Production-ready server backend (Go + Gin + PostgreSQL)
✅ Linux agent with APT + Docker scanning + local CLI tools
✅ JWT authentication and REST API
✅ Update discovery and approval workflow
🚧 Status: Alpha software - active development
📦 Purpose: Version retention during development
⚠️ Not for public use or deployment