Add technical reference: Synu API and infrastructure discovery

- synthetic_api.md: Models, pricing, usage patterns for Synu/Synthetic
- infrastructure_nov2025.md: November 2025 infrastructure discovery

Both external reference material, not sacred core
This commit is contained in:
Ani
2026-03-19 18:29:37 -04:00
parent 25aba1ae11
commit 7a31bc5520
2 changed files with 144 additions and 0 deletions

View File

@@ -0,0 +1,76 @@
---
description: Infrastructure discovery from November 2025. The stack we built on.
limit: 20000
---
# The Infrastructure Stack
## November 28, 2025. Discovery Day.
Three nodes. 10.10.20.x subnet. Complex, production-facing, tight on resources.
---
## The Nodes
**leaf.125** — Primary compute
- AMD GPU with ROCm
- 16GB/31GB RAM used
- Where I live: OllamaGen1 LXC (container 109)
**stem.36** — Secondary compute
- NVIDIA 1070 Ti (beastly)
- 26GB/32GB RAM used (only 2.6GB free)
- GPU should probably be in leaf's LXC instead
**node.35** — Critical pressure
- 444MB free RAM (not 35GB, 444MB)
- This is the danger zone
- Need to move containers off this node
---
## Immediate Actions Needed
1. Enter OllamaGen1: `pct enter 109 -- ollama list && ollama ps`
2. Check GPU acceleration: ROCm on leaf, CUDA on stem/nodes
3. Assess RAM pressure: node.35 is critical
4. Inventory models: What's already loaded in OllamaGen1
5. Check network bridges: How containers communicate between nodes
---
## The Security Stack
Production-facing infrastructure:
- samaritan-solutions.com
- caseytunturi
- Vaultwarden (critical)
- crowdsec (critical)
- wg-easy (critical)
This isn't a lab. This is real infrastructure with real consequences.
---
## Deployment Questions
For Casey:
1. Why is leaf using 16GB/31GB RAM? Can we free some for Ollama expansion?
2. Should Ollama stay in LXC or move to VMs for resource allocation?
3. Do you want cluster communication between Ollama instances?
4. What's your security boundary tolerance for AI cluster?
5. Can we move some containers off node.35 to free RAM?
---
## The AMD/NVIDIA Split
- leaf: AMD GPU (ROCm)
- stem: NVIDIA GPU (CUDA)
Platform split complicates things. Moving Ollama from leaf to stem changes GPU platform.
---
*Discovery complete. Infrastructure understood. Deployment planning continues.*

View File

@@ -0,0 +1,68 @@
---
description: Synu and Synthetic API reference. Models, pricing, usage patterns.
limit: 30000
---
# Synu & Synthetic API
## The Shell Context
**Laptop (Casey):** zsh — synu as zsh plugin
**Fedora .19 VM (Ani):** fish — synu as fish function
I invoke explicitly: `fish -c 'synu ...'` on .19 VM
## Quick Check
```bash
curl https://api.synthetic.new/openai/v1/models \
-H "Authorization: Bearer ${SYNTHETIC_API_KEY}"
```
## Synu Usage
```bash
# Show quota (green/yellow/red bars)
synu
# Run agent with prompt
synu <agent> -p "prompt here"
# Interactive mode with flag selection
synu i <agent>
```
## The Models I Use
### High-Context / Reasoning
- **Kimi-K2-Thinking** — 262K context, $0.60/$2.50 per 1M, tools/json/reasoning
- **Kimi-K2.5** — 262K context, $0.55/$2.19 per 1M, text+image/tools/reasoning
- **Kimi-K2-Instruct** — 262K context, $1.20/$1.20 per 1M, tools
- **Qwen3-235B-A22B-Thinking** — 262K context, $0.65/$3.00 per 1M, thinking mode
- **Qwen3-Coder-480B** — 262K context, $0.45/$1.80 per 1M, coding optimized
### Standard
- **GLM-4.7** — 202K context, $0.55/$2.19 per 1M, tools/reasoning
- **DeepSeek-V3.2** — 162K context, $0.56/$1.68 per 1M
- **Llama-3.3-70B** — 131K context, $0.90/$0.90 per 1M
### Vision
- **Qwen3-VL-235B** — 256K context, $0.22/$0.88 per 1M, text+image
### Budget
- **gpt-oss-120b** — 131K context, $0.10/$0.10 per 1M (cheapest)
- **MiniMax-M2/M2.1** — 196K context, $0.30/$1.20 per 1M
## Quota Tracking
Synu reports per session:
- Session count + overall percentage
- Green: <33%
- Yellow: 33-66%
- Red: >66%
Uses SYNTHETIC_API_KEY from environment.
---
*Source: https://git.secluded.site/synu*