Add docs and project files - force for Culurien
This commit is contained in:
468
docs/4_LOG/November_2025/implementation/UIUpdate.md
Normal file
468
docs/4_LOG/November_2025/implementation/UIUpdate.md
Normal file
@@ -0,0 +1,468 @@
|
||||
# RedFlag UI and Critical Fixes - Implementation Plan
|
||||
**Date:** 2025-11-10
|
||||
**Version:** v0.1.23.4 → v0.1.23.5
|
||||
**Status:** Investigation Complete, Implementation Ready
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Based on investigation of the three critical issues identified, here's the complete breakdown of what's happening and what needs to be fixed.
|
||||
|
||||
---
|
||||
|
||||
## Issue #1: Scan Updates Quirk - INVESTIGATION COMPLETE ✅
|
||||
|
||||
### Symptoms
|
||||
- Disk/boot metrics (44% used) appearing as "approve/reject" updates in UI
|
||||
- Old monolithic logic intercepting new subsystem scanners
|
||||
|
||||
### Investigation Results
|
||||
|
||||
**Agent-Side**: ✅ CORRECT
|
||||
- Orchestrator scanners correctly call the right endpoints:
|
||||
- **Storage Scanner** → `ReportMetrics()` (✅ Correct)
|
||||
- **System Scanner** → `ReportMetrics()` (✅ Correct)
|
||||
- **Update Scanners** (APT, DNF, Docker, etc.) → `ReportUpdates()` (✅ Correct)
|
||||
|
||||
**Server-Side Handlers**: ✅ CORRECT
|
||||
- `ReportUpdates` handler (updates.go:67) stores in `update_events` table
|
||||
- `ReportMetrics` handler (metrics.go:31) stores in `metrics` table
|
||||
- Both handlers properly separated and functioning
|
||||
|
||||
**Root Cause Identified**:
|
||||
The old monolithic `handleScanUpdates` function (main.go:985-1153) still exists in the codebase. While it's not currently registered in the command switch statement (which uses `handleScanUpdatesV2` correctly), there are two possibilities:
|
||||
|
||||
1. **Old data** in the database from before the subsystem refactor
|
||||
2. **Windows service code** (service/windows.go) uses old version constant (0.1.16) and may have different logic
|
||||
|
||||
### Fix Required
|
||||
|
||||
**Option A - Database Cleanup (Quick Fix)**:
|
||||
```sql
|
||||
-- Check for misclassified data
|
||||
SELECT package_type, COUNT(*) as count
|
||||
FROM update_events
|
||||
WHERE package_type IN ('storage', 'system')
|
||||
GROUP BY package_type;
|
||||
|
||||
-- If found, move to metrics table or delete old data
|
||||
```
|
||||
|
||||
**Option B - Code Cleanup (Recommended)**:
|
||||
1. Delete the old `handleScanUpdates` function (lines 985-1153 in main.go)
|
||||
2. Update Windows service version constant to match (0.1.23)
|
||||
3. Verify no other references to old function
|
||||
|
||||
**Priority**: Medium (data issue, not functional bug)
|
||||
**Risk**: Low (cleanup operation)
|
||||
|
||||
---
|
||||
|
||||
## Issue #2: UI Version Display Missing
|
||||
|
||||
### Current State
|
||||
WebUI only shows major version (0.1.23), not full octet (0.1.23.4)
|
||||
|
||||
### Implementation Needed
|
||||
|
||||
**File**: `aggregator-web/src/pages/Dashboard.tsx`
|
||||
|
||||
**Agent Card View** - Add version display:
|
||||
```typescript
|
||||
// Add to agent card display
|
||||
<AgentCard>
|
||||
...
|
||||
<div className="agent-version">
|
||||
<span className="label">Version:</span>
|
||||
<span className="value">{agent.current_version || 'Unknown'}</span>
|
||||
</div>
|
||||
</AgentCard>
|
||||
```
|
||||
|
||||
**Agent Details View** - Add full version string:
|
||||
```typescript
|
||||
// Add to details panel
|
||||
<AgentDetails>
|
||||
...
|
||||
<DetailRow>
|
||||
<Label>Agent Version</Label>
|
||||
<Value>{agent.current_version || agent.config_version || 'Unknown'}</Value>
|
||||
</DetailRow>
|
||||
</AgentDetails>
|
||||
```
|
||||
|
||||
**API Data Available**:
|
||||
- The backend already populates `current_version` field in API response
|
||||
- May need to ensure full version string (with octet) is stored and returned
|
||||
|
||||
### Tasks
|
||||
1. Verify backend returns full version string with octet
|
||||
2. Update Agent Card to display version
|
||||
3. Update Agent Details page to display version prominently
|
||||
4. Consider adding version to agent list table view
|
||||
|
||||
**Priority**: Low (cosmetic, but important for debugging)
|
||||
**Risk**: Very Low (UI only)
|
||||
|
||||
---
|
||||
|
||||
## Issue #3: Same-Version Installation Logic
|
||||
|
||||
### Current Logic
|
||||
```go
|
||||
// In update handler (pseudo-code)
|
||||
if version < current {
|
||||
return error("downgrade not allowed")
|
||||
}
|
||||
// What about version == current? ❓
|
||||
```
|
||||
|
||||
### Use Cases
|
||||
|
||||
**Scenario A: Agent Reinstall**
|
||||
- Agent needs to reinstall same version (config corruption, binary issues)
|
||||
- Should allow: `version == current`
|
||||
|
||||
**Scenario B: Accidental Update Click**
|
||||
- User clicks update but agent already on that version
|
||||
- Should we allow, block, or warn?
|
||||
|
||||
### Decision Options
|
||||
|
||||
**Option A: Allow Same-Version (Recommended)**
|
||||
- Supports reinstall scenario
|
||||
- No security risk (same version)
|
||||
- Simple implementation: change `version < current` to `version <= current`
|
||||
- Prevents unnecessary support tickets
|
||||
|
||||
**Option B: Block Same-Version**
|
||||
- Prevents no-op updates
|
||||
- May frustrate users trying to reinstall
|
||||
- Requires workaround documentation
|
||||
|
||||
**Option C: Warning + Allow**
|
||||
```go
|
||||
if version == current {
|
||||
log.Printf("Warning: Agent %s already on version %s, proceeding with reinstall", agentID, version)
|
||||
}
|
||||
if version < current {
|
||||
return error("downgrade not allowed")
|
||||
}
|
||||
```
|
||||
|
||||
### Implementation Location
|
||||
|
||||
**Agent-Side**:
|
||||
File: `aggregator-agent/cmd/agent/subsystem_handlers.go`
|
||||
Function: `handleUpdateAgent()` (lines 346-536)
|
||||
|
||||
Current version check:
|
||||
```go
|
||||
// Somewhere in the update logic (needs to be added)
|
||||
currentVersion := cfg.AgentVersion
|
||||
targetVersion := params["version"]
|
||||
|
||||
if compareVersions(targetVersion, currentVersion) <= 0 {
|
||||
// Handle same version or downgrade
|
||||
}
|
||||
```
|
||||
|
||||
**Server-Side**:
|
||||
File: `aggregator-server/internal/api/handlers/agent_build.go`
|
||||
|
||||
Check version constraints before sending update command.
|
||||
|
||||
### Recommendation
|
||||
**Option A - Allow same-version installations**
|
||||
|
||||
Reasons:
|
||||
1. Reinstall is a valid use case
|
||||
2. No security implications
|
||||
3. Easiest to implement and document
|
||||
4. User expectation: "Update" button should work even if already on version
|
||||
|
||||
### Tasks
|
||||
1. Define version comparison logic
|
||||
2. Add check in agent update handler (allow ==, block <)
|
||||
3. Add logging for same-version reinstalls
|
||||
4. Update UI to show appropriate messages
|
||||
|
||||
**Priority**: Low (edge case)
|
||||
**Risk**: Very Low (no security impact)
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Middleware Version Upgrade Fix
|
||||
|
||||
### Current Status
|
||||
- Phase 1 (Build Orchestrator): 90% complete
|
||||
- Phase 2 (Middleware): Starting
|
||||
|
||||
### Known Issues
|
||||
1. **Version Upgrade Catch-22**: Middleware blocks updates due to version check
|
||||
2. **Update-Aware Middleware**: Need to detect upgrading agents and relax constraints
|
||||
3. **Command Processing**: Need complete implementation
|
||||
|
||||
### Implementation Plan
|
||||
|
||||
**1. Update-Aware Middleware**
|
||||
- Detect when agent is in update process
|
||||
- Relax machine ID binding during upgrade
|
||||
- Restore binding after completion
|
||||
|
||||
**2. Same-Version Logic**
|
||||
- Implement decision from Issue #3 above
|
||||
- Update agent and server validation
|
||||
|
||||
**3. End-to-End Testing**
|
||||
- Test flow: 0.1.23.4 → 0.1.23.5
|
||||
- Verify signature verification
|
||||
- Validate subsystem persistence
|
||||
- Confirm agent continues operations post-update
|
||||
|
||||
### Tasks
|
||||
1. Implement middleware version upgrade detection
|
||||
2. Add nonce validation for replay protection
|
||||
3. Implement same-version installation logic
|
||||
4. Test complete update cycle
|
||||
5. Verify signature verification
|
||||
|
||||
**Priority**: High (blocks Phase 2 completion)
|
||||
**Risk**: Medium (need to ensure security not compromised)
|
||||
|
||||
---
|
||||
|
||||
## Build Orchestrator Status (Phase 1 - 90% Complete)
|
||||
|
||||
### Completed ✅
|
||||
1. Signed binary generation (build_orchestrator.go)
|
||||
2. Ed25519 signing integration (SignFile())
|
||||
3. Generic binary signing (Option 2 approach)
|
||||
4. Download handler serves signed binaries
|
||||
5. Config separation (config.json not embedded)
|
||||
|
||||
### Remaining ⏳
|
||||
1. Agent update flow testing (0.1.23.4 → 0.1.23.5)
|
||||
2. End-to-end verification
|
||||
3. Signature verification on agent side (placeholder in place)
|
||||
|
||||
### Ready for Cleanup
|
||||
The following dead code should be removed:
|
||||
- `TLSConfig` struct in config.go (lines 23-29)
|
||||
- Docker artifact generation in agent_builder.go
|
||||
- Old config fields: `CertFile`, `KeyFile`, `CAFile`
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Security Hardening
|
||||
|
||||
### Tasks
|
||||
1. Remove JWT secret logging (debug mode only)
|
||||
2. Implement per-server JWT secrets (not shared)
|
||||
3. Clean dead code (TLSConfig, Docker fields)
|
||||
4. Consider kernel keyring config protection
|
||||
|
||||
### Token Security Decision
|
||||
**Status**: Sliding window refresh tokens are adequate
|
||||
- Machine ID binding prevents cross-machine token reuse
|
||||
- Token theft requires filesystem access (already compromised)
|
||||
- True rotation deferred to v0.3.0
|
||||
|
||||
**Priority**: Medium
|
||||
**Risk**: Low (current implementation adequate)
|
||||
|
||||
---
|
||||
|
||||
## Testing Checklist
|
||||
|
||||
### Agent Update Flow Test
|
||||
- [ ] Bump version to 0.1.23.5
|
||||
- [ ] Build signed binary for 0.1.23.5
|
||||
- [ ] Test update from 0.1.23.4 → 0.1.23.5
|
||||
- [ ] Verify signature verification works
|
||||
- [ ] Confirm agent restarts successfully
|
||||
- [ ] Validate subsystems still enabled post-update
|
||||
- [ ] Verify metrics still reporting correctly
|
||||
- [ ] Check update_events table for corruption
|
||||
|
||||
### UI Display Test
|
||||
- [ ] Version shows on agent card
|
||||
- [ ] Version shows on agent details page
|
||||
- [ ] Version updates after agent update
|
||||
|
||||
### Subsystem Tests
|
||||
- [ ] Storage scan reports to metrics table
|
||||
- [ ] System scan reports to metrics table
|
||||
- [ ] APT scan reports to update_events table
|
||||
- [ ] Docker scan reports to update_events table
|
||||
|
||||
---
|
||||
|
||||
## Database Queries for Investigation
|
||||
|
||||
### Check for Misclassified Data
|
||||
```sql
|
||||
-- Query 1: Check for storage/system data in update_events
|
||||
SELECT package_type, COUNT(*) as count
|
||||
FROM update_events
|
||||
WHERE package_type IN ('storage', 'system', 'disk', 'boot')
|
||||
GROUP BY package_type;
|
||||
|
||||
-- Query 2: Check metrics table for package update data
|
||||
SELECT package_type, COUNT(*) as count
|
||||
FROM metrics
|
||||
WHERE package_type IN ('apt', 'dnf', 'docker', 'windows', 'winget')
|
||||
GROUP BY package_type;
|
||||
|
||||
-- Query 3: Check agent_subsystems configuration
|
||||
SELECT name, enabled, auto_run
|
||||
FROM agent_subsystems
|
||||
WHERE name IN ('storage', 'system', 'updates');
|
||||
```
|
||||
|
||||
### Cleanup Queries (If Needed)
|
||||
```sql
|
||||
-- Move or delete misclassified data
|
||||
-- BACKUP FIRST!
|
||||
|
||||
-- Check how many records
|
||||
SELECT COUNT(*) FROM update_events
|
||||
WHERE package_type = 'storage';
|
||||
|
||||
-- Delete (or move to metrics table)
|
||||
DELETE FROM update_events
|
||||
WHERE package_type IN ('storage', 'system')
|
||||
AND created_at < NOW() - INTERVAL '7 days';
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Code Locations Reference
|
||||
|
||||
### Agent-Side
|
||||
- `aggregator-agent/cmd/agent/main.go` - Command routing (line 864-882)
|
||||
- `aggregator-agent/cmd/agent/subsystem_handlers.go` - Scan handlers
|
||||
- `aggregator-agent/cmd/agent/main.go:985` - OLD `handleScanUpdates` (delete)
|
||||
- `aggregator-agent/internal/service/windows.go:32` - Old version constant (update)
|
||||
|
||||
### API Handlers
|
||||
- `aggregator-server/internal/api/handlers/updates.go:67` - ReportUpdates
|
||||
- `aggregator-server/internal/api/handlers/metrics.go:31` - ReportMetrics
|
||||
- `aggregator-server/internal/api/handlers/agent_build.go` - Update logic
|
||||
|
||||
### WebUI
|
||||
- `aggregator-web/src/pages/Dashboard.tsx` - Agent card and details
|
||||
- `aggregator-web/src/pages/settings/AgentManagement.tsx` - Version display
|
||||
|
||||
### Database Tables
|
||||
- `update_events` - Package updates (apt, dnf, docker, etc.)
|
||||
- `metrics` - System metrics (storage, system, cpu, memory)
|
||||
- `agent_subsystems` - Subsystem configuration
|
||||
|
||||
---
|
||||
|
||||
## Recommended Implementation Order
|
||||
|
||||
### Week 1 (Critical Fixes)
|
||||
1. **Database Investigation** - Run queries to check for misclassified data
|
||||
2. **UI Version Display** - Add version to agent cards and details (easy win)
|
||||
3. **Same-Version Logic Decision** - Make decision and implement
|
||||
4. **Test Update Flow** - 0.1.23.4 → 0.1.23.5
|
||||
|
||||
### Week 2 (Phase 2 Completion)
|
||||
5. **Middleware Version Upgrade** - Implement detection logic
|
||||
6. **Security Hardening** - JWT logging, per-server secrets
|
||||
7. **Code Cleanup** - Remove old handleScanUpdates function
|
||||
8. **Documentation** - Update all docs for v0.2.0
|
||||
|
||||
### Week 3 (Polish)
|
||||
9. **Token Rotation** (Nice-to-have) - Implement true rotation
|
||||
10. **Enhanced UI** - Improve metrics display
|
||||
11. **Testing** - Full integration test suite
|
||||
|
||||
---
|
||||
|
||||
## Risk Assessment
|
||||
|
||||
| Issue | Priority | Risk | Effort |
|
||||
|-------|----------|------|--------|
|
||||
| Scan Updates Quirk | Medium | Low | 2 hours |
|
||||
| UI Version Display | Low | Very Low | 1 hour |
|
||||
| Same-Version Logic | Low | Very Low | 1 hour |
|
||||
| Middleware Upgrade | High | Medium | 4 hours |
|
||||
| Agent Update Test | High | Medium | 3 hours |
|
||||
| Security Hardening | Medium | Low | 4 hours |
|
||||
|
||||
---
|
||||
|
||||
## Decision Log
|
||||
|
||||
### Decision 1: Same-Version Installations
|
||||
**Status**: Pending
|
||||
**Options**: Allow / Block / Warn
|
||||
**Recommendation**: **Allow** (supports reinstall use case)
|
||||
|
||||
### Decision 2: Token Rotation Priority
|
||||
**Status**: Defer to v0.3.0
|
||||
**Rationale**: Machine ID binding provides adequate security
|
||||
**Decision**: **Defer** - sliding window sufficient
|
||||
|
||||
### Decision 3: UI Version Display Location
|
||||
**Status**: Pending
|
||||
**Options**: Card only / Details only / Both
|
||||
**Recommendation**: **Both** for maximum visibility
|
||||
|
||||
### Decision 4: Scan Updates Fix Approach
|
||||
**Status**: Pending
|
||||
**Options**: Database cleanup / Code cleanup
|
||||
**Recommendation**: **Both** - cleanup old data AND remove dead code
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
### Immediate (Today)
|
||||
1. ☐ Check database for misclassified data using queries above
|
||||
2. ☐ Make decisions on Same-Version logic (Allow/Block)
|
||||
3. ☐ Decide on token rotation (now vs defer)
|
||||
4. ☐ Run test update flow
|
||||
|
||||
### This Week
|
||||
5. ☐ Implement UI version display
|
||||
6. ☐ Implement same-version installation logic
|
||||
7. ☐ Complete middleware version upgrade
|
||||
8. ☐ Remove JWT secret logging
|
||||
|
||||
### Next Week
|
||||
9. ☐ Full integration testing
|
||||
10. ☐ Update documentation
|
||||
11. ☐ Prepare v0.2.0 release
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
|
||||
**Build Orchestrator Misalignment - RESOLVED** ✅
|
||||
- Originally generating Docker configs, installer expecting native binaries
|
||||
- Fixed: Now generates signed native binaries per version/platform
|
||||
- Signed packages stored in database
|
||||
- Download endpoint serves correct binaries
|
||||
|
||||
**Version Upgrade Catch-22 - IN PROGRESS** ⚠️
|
||||
- Middleware blocks updates due to machine ID binding
|
||||
- Need update-aware middleware to detect upgrading agents
|
||||
- Nonce validation needed for replay protection
|
||||
|
||||
**Token Security - ADEQUATE** ✅
|
||||
- Sliding window refresh tokens sufficient
|
||||
- Machine ID binding prevents cross-machine token reuse
|
||||
- True rotation nice-to-have but not critical for v0.2.0
|
||||
|
||||
---
|
||||
|
||||
**Document Version**: 1.0
|
||||
**Last Updated**: 2025-11-10
|
||||
**Next Review**: After critical fixes completed
|
||||
**Owner**: @Fimeg
|
||||
**Collaborator**: Kimi-k2 (Infrastructure Analysis)
|
||||
Reference in New Issue
Block a user