feat: add resilience and reliability features for agent subsystems
Added circuit breakers with configurable timeouts for all subsystems (APT, DNF, Docker, Windows, Winget, Storage). Replaces cron-based scheduler with priority queue that should scale beyond 1000+ agents if your homelab is that big. Command acknowledgment system ensures results aren't lost on network failures or restarts. Agent tracks pending acknowledgments with persistent state and automatic retry. - Circuit breakers: 3 failures in 1min opens circuit, 30s cooldown - Per-subsystem timeouts: 30s-10min depending on scanner - Priority queue scheduler: O(log n), worker pool, jitter, backpressure - Acknowledgments: at-least-once delivery, max 10 retries over 24h - All tests passing (26/26)
This commit is contained in:
Binary file not shown.
|
Before Width: | Height: | Size: 76 KiB |
BIN
Screenshots/AgentMgmt.png
Normal file
BIN
Screenshots/AgentMgmt.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 114 KiB |
Binary file not shown.
|
Before Width: | Height: | Size: 107 KiB After Width: | Height: | Size: 104 KiB |
BIN
Screenshots/RedFlag Linux Agent Health Details.png
Normal file
BIN
Screenshots/RedFlag Linux Agent Health Details.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 109 KiB |
BIN
Screenshots/RedFlag Linux Agent Update Details.png
Normal file
BIN
Screenshots/RedFlag Linux Agent Update Details.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 149 KiB |
Reference in New Issue
Block a user