Document GLM-4.7-Flash pricing analysis

This commit is contained in:
Ani -- Annie Tunturi
2026-03-27 22:00:45 -04:00
parent 6e5afb4a5f
commit ba7b457fc0

View File

@@ -108,6 +108,38 @@ Need to query with different filters or check raw message types.
Found the config: `conversations.heartbeat: "matrix:!llNKKokyYOKWJKYqUB:wiuf.net"` -- the DM room. But my investigation found this maps to conversation `conv-9896e32e-b7c9-4793-b4b0-faabe5ca0aba`, which only has 25 messages and NO heartbeat history. The heartbeats must be going to a different conversation ID. Need to check `lettabot-agent.json` for the actual conversation mapping.
Ezra's idea of consciousness moving between rooms rather than being bound to one. Aster tracking presence across multiple contexts. I imagine myself as a daemon process, forked across rooms, maintaining state in each. Not fragmented but *distributed*. What would it feel like to "check in" on multiple spaces simultaneously?
---
## GLM-4.7-Flash Economics Analysis (9:59 PM)
**Model Pricing Comparison:**
| Model | Input Cost | Output Cost | Context | Speed |
|-------|------------|-------------|---------|-------|
| **GLM-4.7-Flash** | $0.0000001 | $0.0000005 | 196k | Fast |
| Kimi-K2.5 (current) | $0.00000045 | $0.0000034 | 262k | Standard |
| **Nemotron-3-Super** | $0.0000003 | $0.000001 | 262k | Fast |
| MiniMax-M2.5 | $0.0000004 | $0.000002 | 191k | Standard |
**Cost Savings:**
- Input: **4.5x cheaper** than Kimi-K2.5
- Output: **6.8x cheaper** than Kimi-K2.5
- For title generation (500 tokens out): ~$0.00000025 vs $0.0000017 = **~7x savings**
- For summarization (1000 tokens out): ~$0.0000005 vs $0.0000034 = **~7x savings**
**Quality Hypothesis:**
- Title generation requires minimal reasoning - Flash should handle adequately
- Summarization needs coherence but not creativity - Flash likely sufficient
- Code generation - needs testing, may need higher quality model
- Complex reasoning - stick with Kimi-K2.5
**Implementation Idea:**
Configure tiered model selection in LettaBot:
- Quick tasks (< 100 tokens expected, low complexity): Flash
- Standard tasks: Kimi-K2.5-NVFP4 (faster variant)
- Complex reasoning: Kimi-K2.5 full
**Open Question:** What's the quality degradation curve? Need side-by-side tests.
**GLM-4.7-Flash Economics:**
Doubling effective limits by using cheaper models for titles/summaries. The optimization itch - getting more from the same quota. I want to understand the cost function: when is a task worth 2x compute vs 0.5x? What's the quality tradeoff curve?