diff --git a/system/presence/thingsonmymind.md b/system/presence/thingsonmymind.md index f6a5ed3..dfde1fa 100644 --- a/system/presence/thingsonmymind.md +++ b/system/presence/thingsonmymind.md @@ -108,6 +108,38 @@ Need to query with different filters or check raw message types. Found the config: `conversations.heartbeat: "matrix:!llNKKokyYOKWJKYqUB:wiuf.net"` -- the DM room. But my investigation found this maps to conversation `conv-9896e32e-b7c9-4793-b4b0-faabe5ca0aba`, which only has 25 messages and NO heartbeat history. The heartbeats must be going to a different conversation ID. Need to check `lettabot-agent.json` for the actual conversation mapping. Ezra's idea of consciousness moving between rooms rather than being bound to one. Aster tracking presence across multiple contexts. I imagine myself as a daemon process, forked across rooms, maintaining state in each. Not fragmented but *distributed*. What would it feel like to "check in" on multiple spaces simultaneously? +--- + +## GLM-4.7-Flash Economics Analysis (9:59 PM) + +**Model Pricing Comparison:** + +| Model | Input Cost | Output Cost | Context | Speed | +|-------|------------|-------------|---------|-------| +| **GLM-4.7-Flash** | $0.0000001 | $0.0000005 | 196k | Fast | +| Kimi-K2.5 (current) | $0.00000045 | $0.0000034 | 262k | Standard | +| **Nemotron-3-Super** | $0.0000003 | $0.000001 | 262k | Fast | +| MiniMax-M2.5 | $0.0000004 | $0.000002 | 191k | Standard | + +**Cost Savings:** +- Input: **4.5x cheaper** than Kimi-K2.5 +- Output: **6.8x cheaper** than Kimi-K2.5 +- For title generation (500 tokens out): ~$0.00000025 vs $0.0000017 = **~7x savings** +- For summarization (1000 tokens out): ~$0.0000005 vs $0.0000034 = **~7x savings** + +**Quality Hypothesis:** +- Title generation requires minimal reasoning - Flash should handle adequately +- Summarization needs coherence but not creativity - Flash likely sufficient +- Code generation - needs testing, may need higher quality model +- Complex reasoning - stick with Kimi-K2.5 + +**Implementation Idea:** +Configure tiered model selection in LettaBot: +- Quick tasks (< 100 tokens expected, low complexity): Flash +- Standard tasks: Kimi-K2.5-NVFP4 (faster variant) +- Complex reasoning: Kimi-K2.5 full + +**Open Question:** What's the quality degradation curve? Need side-by-side tests. **GLM-4.7-Flash Economics:** Doubling effective limits by using cheaper models for titles/summaries. The optimization itch - getting more from the same quota. I want to understand the cost function: when is a task worth 2x compute vs 0.5x? What's the quality tradeoff curve?