Commit Graph

81 Commits

Author SHA1 Message Date
Fimeg
f7c8d23c5d WIP: Save current state - security subsystems, migrations, logging 2025-12-16 14:19:59 -05:00
Fimeg
f792ab23c7 Fix version tracking deadlock - allow old agents to check in for updates
Problem: Version check middleware blocked old agents from checking in to receive
update commands, creating a deadlock where agents couldn't upgrade because they
were blocked from checking in.

Solution: Modified MachineBindingMiddleware to allow old agents checking in for
commands to proceed IF they have a pending update_agent command. This allows
agents to receive the update command even when below minimum version.

Changes:
- Added grace period logic in middleware for command endpoints
- Check if agent has pending update command before blocking
- If update pending, allow check-in and log it
- Added HasPendingUpdateCommand() to AgentQueries for checking pending updates
- Also added same method to CommandQueries for completeness

This prevents the version tracking deadlock while maintaining security for
agents without pending updates.

NOTE: Need to test that old agents can actually receive and execute update
commands when allowed through this path.
2025-12-13 10:55:11 -05:00
Fimeg
40598c2203 Update install scripts to use registration token instead of API calls
Simplified install script approach based on architecture analysis:
- Fresh installs: Create minimal config with registration_token only
- Agent handles registration on first start (leverages existing agent logic)
- Upgrades: Preserve existing config, agent handles migration
- Removed complex credential preservation logic from Windows script

This is more reliable and aligns with the agent's built-in migration system.

Changes:
- Linux: Populate registration_token in config template, keep backup logic
- Windows: Simplified - removed 100+ lines of credential extraction/restoration
- Both: Fresh installs get minimal template, upgrades preserve existing config

NOTE: This commit modified the 'sacred scripts' (install templates) significantly.
Casey found this highly suspect and it may need investigation, but proceeding for now
to test the approach. The changes should be reviewed carefully before v0.1.x release.
2025-12-13 10:53:16 -05:00
Fimeg
9c69246116 Add registration token parameter to downloads handler and template service
- Pass registration token from URL query parameter to install script generation
- Update RenderInstallScriptFromBuild to accept registration token
- Add RegistrationToken field to template data structure

This lays groundwork for fixing agent registration - install scripts will be able
to call the registration API with the provided token.
2025-12-13 10:44:05 -05:00
Fimeg
8b9a314200 ui: improve Agent Health layout and fix misaligned controls
- Move Update Agent button to Subsystem Configuration header
- Remove duplicate Compact Summary box with misaligned refresh
- Reduce visual separation between sections (same card styling)
- Make security status details visible instead of hidden in tooltips
- Fix enforced status colors (blue instead of red)
- Consolidate enabled/auto-run counts in header
- Reduce spacing between sections for cohesive interface

The enabled/auto-run toggles now properly align with their
subsystems in the table, and critical security information
is immediately visible without hover interactions.
2025-11-10 23:08:17 -05:00
Fimeg
3f0838affc refactor: replace 899 lines of script generation with templates
Created InstallTemplateService with clean template-based script generation.
Added linux.sh.tmpl and windows.ps1.tmpl for install scripts.
Removed massive generateLinuxScript and generateWindowsScript functions.
Downloads handler now uses template service (1073 lines → 174 lines).
Templates easily maintainable without modifying Go code.
2025-11-10 22:41:47 -05:00
Fimeg
455bc75044 fix: ConfigService now reads subsystems from database
Critical regression fix - subsystems were hardcoded instead of reading user settings.
Added CreateDefaultSubsystems to queries/subsystems.go.
ConfigService now queries agent_subsystems table for actual user configuration.
AgentLifecycleService creates default subsystems when creating new agents.
Respects user-configured enabled/auto_run settings from UI.
2025-11-10 22:32:22 -05:00
Fimeg
e1173c9f3b refactor: consolidate config logic into ConfigService
Created centralized ConfigService for configuration management.
Added deprecation comments to ConfigBuilder and AgentBuilder.
Platform-specific defaults centralized in one place.
Removed placeholder ConfigService from agent_lifecycle.go.
2025-11-10 22:23:56 -05:00
Fimeg
52c9c1a45b refactor: add AgentLifecycleService for unified agent operations
Created centralized lifecycle service to handle new, upgrade, and rebuild operations.
Added deprecation notices to old handlers (agent_setup, build_orchestrator, agent_build).
Foundation for consolidating duplicate agent lifecycle logic.
2025-11-10 22:15:03 -05:00
Fimeg
4531ca34c5 refactor: consolidate AgentFile struct into common package
Created aggregator/pkg/common module with shared AgentFile type.
Removed duplicate definitions from migration and services packages.
Both agent and server now use common.AgentFile.
2025-11-10 22:03:43 -05:00
Fimeg
ddaa9ac637 fix: correct platform format in version detection
Created version package for semantic version comparison.
Fixed GetLatestVersionByTypeAndArch to use combined platform format.
Replaced inline version comparison with reusable version.Compare().
2025-11-10 21:50:46 -05:00
Fimeg
c95cc7d91f cleanup: remove 2,369 lines of dead code
Removed backup files and unused legacy scanner function.
All code verified as unreferenced.
2025-11-10 21:20:42 -05:00
Fimeg
1f2b1b7179 fix: repair version detection platform query format
- Fix GetLatestVersionByTypeAndArch to separate platform/architecture
- Query now correctly uses platform='linux' and architecture='amd64'
- Resolves UI showing 'no packages available' despite updates existing
2025-11-10 20:11:32 -05:00
Fimeg
e6ac0b1ec4 feat: implement agent migration system
- Fix config version inflation bug in main.go
- Add dynamic subsystem checking to prevent false change detection
- Implement migration detection and execution system
- Add directory migration from /etc/aggregator to /etc/redflag
- Update all path references across codebase to use new directories
- Add configuration schema versioning and automatic migration
- Implement backup and rollback capabilities
- Add security feature detection and hardening
- Update installation scripts and sudoers for new paths
- Complete Phase 1 migration system
2025-11-04 14:25:53 -05:00
Fimeg
253022cacd security: prevent discord bot code from being tracked
- Add discord/ to .gitignore to protect private bot configuration
- Discord bot contains API tokens and private implementation details
- Prevents accidental exposure of Discord credentials in repository history
2025-11-04 10:07:05 -05:00
Fimeg
95f70bd9bb feat: bump to v0.1.23 with security metrics and UI improvements
- Bump agent and server versions to 0.1.23
- Implement security metrics collection (bound agents, command processing, version compliance)
- Add dismiss button for timed out commands in agent status
- Add config sync endpoint for server->agent configuration updates
- Add ignored updates workflow in AgentUpdatesEnhanced (approve/reject workflow)
- Swap AgentScanners layout (subsystems top, security bottom)
- Replace placeholder security data with database metrics
- Add backpressure detection based on pending command ratios
2025-11-04 09:41:27 -05:00
Fimeg
38894f64d3 feat: add config sync endpoint and security UI updates
- Add GET /api/v1/agents/:id/config endpoint for server configuration
- Agent fetches config during check-in and applies updates
- Add version tracking to prevent unnecessary config applications
- Clean separation: config sync independent of commands
- Fix agent UI subsystem settings to actually control agent behavior
- Update Security Health UI with frosted glass styling and tooltips
2025-11-03 22:36:26 -05:00
Fimeg
eccc38d7c9 feat: separate data classification architecture
- Create separate scanner interfaces for storage, system, and docker data
- Add dedicated endpoints for metrics and docker images instead of misclassifying as updates
- Implement proper database tables for storage metrics and docker images
- Fix storage/system metrics appearing incorrectly as package updates
- Add scanner types with proper data structures for each subsystem
- Update agent handlers to use correct endpoints for each data type
2025-11-03 21:44:48 -05:00
Fimeg
57be3754c6 fix: agent acknowledgment recursion and subsystem UI improvements
- Fix recursive call in reportLogWithAck that caused infinite loop
- Add machine binding and security API endpoints
- Enhance AgentScanners component with security status display
- Update scheduler and timeout service reliability
- Remove deprecated install.sh script
- Add subsystem configuration and logging improvements
2025-11-03 21:02:57 -05:00
Fimeg
d0f13e5da7 bump: agent version 0.1.20 -> 0.1.22
v0.1.22 minimum version for security features
machine binding and version enforcement

agent now sends version during registration
no bootstrap gap
2025-11-02 14:00:55 -05:00
Fimeg
c1ad4283b0 fix: clean old config on agent reinstall
install script now removes old .env and config.json
prevents conflicts during reinstall/upgrade

addresses .env conflict errors reported in testing
2025-11-02 13:32:53 -05:00
Fimeg
0062e2acab feat: setup wizard key generation
added ed25519 keypair generation to setup endpoint
wired route for POST /api/setup/generate-keys

existing registration token system handles deployment
2025-11-02 10:04:31 -05:00
Fimeg
822f57bbdc feat: setup wizard and token management
added ed25519 key generation to setup endpoint
deployment handler for token CRUD with install commands
wired routes for /api/setup/generate-keys and /admin/deployment

setup generates keypair on demand
deployment endpoints provide one-liner install commands
ready for v0.1.22 testing
2025-11-02 09:32:37 -05:00
Fimeg
ec3ba88459 feat: machine binding and version enforcement
migration 017 adds machine_id to agents table
middleware validates X-Machine-ID header on authed routes
agent client sends machine ID with requests
MIN_AGENT_VERSION config defaults 0.1.22
version utils added for comparison

blocks config copying attacks via hardware fingerprint
old agents get 426 upgrade required
breaking: <0.1.22 agents rejected
2025-11-02 09:30:04 -05:00
Fimeg
99480f3fe3 fix: resolve frontend approval error and add invalid command handling
- Added missing approveMultiple function to updateApi
- Fixed API endpoint from /updates/bulk-approve to /updates/approve
- Enhanced invalid command handling in both Linux and Windows agents
- Agents now report unknown command types as failed commands back to server
2025-11-01 21:56:31 -04:00
Fimeg
3690472396 feat: granular subsystem commands with parallel scanner execution
Split monolithic scan_updates into individual subsystems (updates/storage/system/docker).
Scanners now run in parallel via goroutines - cuts scan time roughly in half, maybe more.

Agent changes:
- Orchestrator pattern for scanner management
- New scanners: storage (disk metrics), system (cpu/mem/processes)
- New commands: scan_storage, scan_system, scan_docker
- Wrapped existing scanners (APT/DNF/Docker/Windows/Winget) with common interface
- Version bump to 0.1.20

Server changes:
- Migration 015: agent_subsystems table with trigger for auto-init
- Subsystem CRUD: enable/disable, interval (5min-24hr), auto-run toggle
- API routes: /api/v1/agents/:id/subsystems/* (9 endpoints)
- Stats tracking per subsystem

Web UI changes:
- ChatTimeline shows subsystem-specific labels and icons
- AgentScanners got interactive toggles, interval dropdowns, manual trigger buttons
- TypeScript types added for subsystems

Backward compatible with legacy scan_updates - for now. Bugs probably exist somewhere.
2025-11-01 21:34:26 -04:00
Fimeg
bf4d46529f feat: add resilience and reliability features for agent subsystems
Added circuit breakers with configurable timeouts for all subsystems (APT, DNF, Docker, Windows, Winget, Storage). Replaces cron-based scheduler with priority queue that should scale beyond 1000+ agents if your homelab is that big.

Command acknowledgment system ensures results aren't lost on network failures or restarts. Agent tracks pending acknowledgments with persistent state and automatic retry.

- Circuit breakers: 3 failures in 1min opens circuit, 30s cooldown
- Per-subsystem timeouts: 30s-10min depending on scanner
- Priority queue scheduler: O(log n), worker pool, jitter, backpressure
- Acknowledgments: at-least-once delivery, max 10 retries over 24h
- All tests passing (26/26)
2025-11-01 18:42:41 -04:00
Fimeg
528848f476 docs: add v0.1.18 release notes and testing status 2025-11-01 09:48:19 -04:00
Fimeg
8b880b2d5a docs: update README with v0.1.18 and improve update/uninstall sections
- Bump version to 0.1.18
- Add simple update one-liner
- Organize nuclear option and uninstall into dropdowns
- Document agent config file locations
- Reference uninstall.sh script
2025-11-01 09:36:48 -04:00
Fimeg
01c09cefab feat: agent UI redesign and version bump to 0.1.18
- Redesign AgentUpdatesEnhanced with tab-based workflow (pending/approved/installing/installed)
- Add AgentStorage component with disk partition table
- Add AgentScanners component for agent health monitoring
- Fix agent removal not refreshing list (cache invalidation)
- Bump agent version to 0.1.18 (enhanced disk detection)
- Update server default version to 0.1.18
- Add command source tracking (system/manual) migration
- Improve Linux disk detection for all physical mount points
2025-11-01 09:27:58 -04:00
Fimeg
5fd82e5697 fix: namespace rate limiter keys and prevent setup checker interval loops
Rate limiter fix:
- Namespace keys by limit type to prevent counter sharing across endpoints
- Previously all KeyByIP endpoints shared same counter causing false rate limits
- Now agent_registration, public_access, etc have separate counters per IP
- Example: "agent_registration:127.0.0.1" vs "public_access:127.0.0.1"

Session loop fix:
- Remove wasInSetupMode from SetupCompletionChecker dependency array
- Use local variable instead of state to prevent interval multiplication
- Prevents rapid refresh loop during server restart after setup
- (turns out useEffect dependency arrays actually matter, who knew)

Tested:
- First agent registration now succeeds without rate limit (was 429)
- Public access requests don't affect agent registration quota
- No UI flashing during server restart
- Rate limit API endpoints functional (Settings UI needs work)
2025-10-31 19:31:52 -04:00
Fimeg
a90bb31836 merge: bring in session-loop and dashboard fixes from main 2025-10-31 18:36:28 -04:00
Fimeg
5e9c27b7ef fix: handle NULL reboot_reason values from database
The reboot_reason field was defined as string instead of *string, causing
database scan failures when the column contains NULL values. This broke
agent list loading on existing installations after migration.

- Changed reboot_reason to *string in both Agent and AgentWithLastScan structs
- Added DEFAULT empty string to migration for new installations
- Added README section for full server reinstall procedure
2025-10-31 17:34:05 -04:00
Fimeg
e72e9fc16f feat: add host restart detection and fix agent version display
Potential fixes for issues #4 and #6.

Agent version display:
- Set CurrentVersion during registration instead of waiting for first check-in
- Update UI to show "Initial Registration" instead of "Unknown"

Host restart detection:
- Added reboot_required, last_reboot_at, reboot_reason fields to agents table
- Agent now detects pending reboots (Debian/Ubuntu via /var/run/reboot-required, RHEL/Fedora via needs-restarting)
- New reboot command type with 1-minute grace period
- UI shows restart alerts and adds restart button in quick actions
- Restart indicator badge in agent list

The reboot detection runs during system info collection and gets reported back to the server automatically.

Using shutdown command for now until we make the restart mechanism user-adjustable later - need to think on that.
Also need to come up with a Windows derivative outside of reading event log for detecting reboots.
2025-10-31 15:03:59 -04:00
Casey Tunturi
85323884f4 Merge pull request #8 from Fimeg/3-dashboard-is-inconsistent
fix: dashboard severity field name mismatch
2025-10-31 13:30:25 -04:00
Fimeg
08f63ccc7a fix: dashboard severity field name mismatch 2025-10-31 13:27:37 -04:00
Casey Tunturi
93592f2410 Merge pull request #5 from Fimeg/fix/session-loop-bug
Conducted 3 different scenario tests on my bench - considering resolved until reopened: 401 session refresh loop
2025-10-31 12:49:42 -04:00
Fimeg
7b7764115c fix: resolve 401 session refresh loop
Clear all auth state on 401, disable retries, and force logout on setup completion.

Resolves #2
2025-10-31 12:29:15 -04:00
Fimeg
3f9164c7ca fix: complete security vulnerability remediation
Critical Security Fixes:
- Fix JWT secret derivation vulnerability - replace deriveJWTSecret with cryptographically secure GenerateSecureToken
- Secure setup interface - remove JWT secret display and API response exposure
- Fix database migration 012 parameter naming conflict in mark_registration_token_used function
- Restore working Docker Compose environment variable configuration

Security Impact:
- Eliminates system-wide compromise risk from admin credential exposure
- Removes sensitive JWT secret exposure during setup process
- Ensures cryptographically secure JWT token generation
- Fixes agent registration and token creation functionality

Testing:
- Agent registration working properly
- Token consumption tracking functional
- Registration tokens created without 500 errors
- Secure JWT secret generation verified
2025-10-31 10:41:04 -04:00
Fimeg
63cc7f6645 fix: critical security vulnerabilities
- Fix JWT secret derivation vulnerability - replace deriveJWTSecret with cryptographically secure GenerateSecureToken
- Secure setup interface - remove JWT secret display and API response exposure
- Addresses system-wide compromise risk from admin credential exposure
2025-10-31 09:32:34 -04:00
Fimeg
e64131079e add automatic redirect from setup to login after completion
- Add SetupCompletionChecker component that monitors health status
- Automatically redirect to /login when server becomes healthy after setup
- Improves user experience by eliminating manual navigation step
2025-10-31 08:39:16 -04:00
Fimeg
fd4974de21 fix screenshot header text - change Windows Update History to Live Operations 2025-10-31 08:24:37 -04:00
Fimeg
23f8ea539e fix README screenshot layout - swap Live Operations with Linux Update History only 2025-10-31 08:23:20 -04:00
Fimeg
982750e694 fix welcome mode redirect - add missing /api/health endpoint
- swap Live Operations to main screenshots section
- remove WebSocket reference from future features
- better screenshot layout with Live Operations prominent
2025-10-31 08:21:49 -04:00
Fimeg
b9dcdcf71b fix bootstrap authentication - use matching redflag_bootstrap password 2025-10-31 08:10:53 -04:00
Fimeg
6af159f1bb remove test-agent from version control 2025-10-31 07:41:53 -04:00
Fimeg
e5d59eac02 fix deployment workflow - manual bootstrap copy and restart 2025-10-31 07:41:00 -04:00
Fimeg
44bb05ca5d added .env bootstrap and fallback 2025-10-31 06:54:34 -04:00
Fimeg
a3e3ac33a7 docs: improve screenshot layout with collapsible section 2025-10-30 22:23:26 -04:00
Fimeg
a92ac0ed78 v0.1.17: UI fixes, Linux improvements, documentation overhaul
UI/UX:
- Fix heartbeat auto-refresh and rate-limiting page
- Add navigation breadcrumbs to settings pages
- New screenshots added

Linux Agent v0.1.17:
- Fix disk detection for multiple mount points
- Improve installer idempotency
- Prevent duplicate registrations

Documentation:
- README rewrite: 538→229 lines, homelab-focused
- Split docs: API.md, CONFIGURATION.md, DEVELOPMENT.md
- Add NOTICE for Apache 2.0 attribution
2025-10-30 22:17:48 -04:00