Commit Graph

15 Commits

Author SHA1 Message Date
Kian Jones
9418ab9815 feat: add provider trace backend abstraction for multi-backend telemetry (#8814)
* feat: add provider trace backend abstraction for multi-backend telemetry

Introduces a pluggable backend system for provider traces:
- Base class with async/sync create and read interfaces
- PostgreSQL backend (existing behavior)
- ClickHouse backend (via OTEL instrumentation)
- Socket backend (writes to Unix socket for crouton sidecar)
- Factory for instantiating backends from config

Refactors TelemetryManager to use backends with support for:
- Multi-backend writes (concurrent via asyncio.gather)
- Primary backend for reads (first in config list)
- Graceful error handling per backend

Config: LETTA_TELEMETRY_PROVIDER_TRACE_BACKEND (comma-separated)
Example: "postgres,socket" for dual-write to Postgres and crouton

🐙 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* feat: add protocol version to socket backend records

Adds PROTOCOL_VERSION constant to socket backend:
- Included in every telemetry record sent to crouton
- Must match ProtocolVersion in apps/crouton/main.go
- Enables crouton to detect and reject incompatible messages

🐙 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: remove organization_id from ProviderTraceCreate calls

The organization_id is now handled via the actor parameter in the
telemetry manager, not through ProviderTraceCreate schema. This fixes
validation errors after changing ProviderTraceCreate to inherit from
BaseProviderTrace which forbids extra fields.

🐙 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* consolidate provider trace

* add clickhouse-connect to fix bug on main lmao

* auto generated sdk changes, and deployment details, and clikchouse prefix bug and added fields to runs trace return api

* auto generated sdk changes, and deployment details, and clikchouse prefix bug and added fields to runs trace return api

* consolidate provider trace

* consolidate provider trace bug fix

---------

Co-authored-by: Letta <noreply@letta.com>
2026-01-19 15:54:43 -08:00
Kian Jones
3eae81cf62 feat: add /v1/runs/{run_id}/trace endpoint for OTEL traces (#8682)
* feat: add /v1/runs/{run_id}/trace endpoint for OTEL traces

- Add new endpoint to retrieve filtered OTEL spans for a run
- Filter to only return UI-relevant spans (agent_step, tool executions, root span, TTFT)
- Skip Postgres writes when ClickHouse is enabled for provider traces
- Add USE_CLICKHOUSE_FOR_PROVIDER_TRACES env var to helm/justfile
- Move typecheck CI to self-hosted runners

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* fix: add missing clickhouse_provider_traces.py

The telemetry_manager.py imports ClickhouseProviderTraceReader from
this module, but the file was not included when splitting the PR.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

* autogen

* fix: add trace.retrieve to stainless.yml for SDK generation

Adds the runs.trace.retrieve method mapping so Stainless generates
the useRunsServiceRetrieveTraceForRun hook.

🤖 Generated with [Letta Code](https://letta.com)

Co-Authored-By: Letta <noreply@letta.com>

---------

Co-authored-by: Letta <noreply@letta.com>
2026-01-19 15:54:39 -08:00
cthomas
9a95a8f976 fix: duplicate session commit in step logging (#7512)
* fix: duplicate session commit in step logging

* update all callsites
2026-01-12 10:57:19 -08:00
Sarah Wooders
4df0a27eb0 chore: remove sync db (#4873) 2025-10-07 17:50:45 -07:00
Kian Jones
b8e9a80d93 merge this (#4759)
* wait I forgot to comit locally

* cp the entire core directory and then rm the .git subdir
2025-09-17 15:47:40 -07:00
Kian Jones
22f70ca07c chore: officially migrate to submodule (#4502)
* remove apps/core and apps/fern

* fix precommit

* add submodule updates in workflows

* submodule

* remove core tests

* update core revision

* Add submodules: true to all GitHub workflows

- Ensure all workflows can access git submodules
- Add submodules support to deployment, test, and CI workflows
- Fix YAML syntax issues in workflow files

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* remove core-lint

* upgrade core with latest main of oss

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-09-09 12:45:53 -07:00
Kian Jones
fecf6decfb chore: migrate to ruff (#4305)
* base requirements

* autofix

* Configure ruff for Python linting and formatting

- Set up minimal ruff configuration with basic checks (E, W, F, I)
- Add temporary ignores for common issues during migration
- Configure pre-commit hooks to use ruff with pass_filenames
- This enables gradual migration from black to ruff

* Delete sdj

* autofixed only

* migrate lint action

* more autofixed

* more fixes

* change precommit

* try changing the hook

* try this stuff
2025-08-29 11:11:19 -07:00
cthomas
b0d4cad27c feat: use no refresh in create_provider_trace_async (#3400) 2025-07-17 22:14:45 -07:00
Andy Li
d2252f2953 feat: otel metrics and expanded collecting (#2647)
(passed tests in last run)
2025-06-05 17:20:14 -07:00
cthomas
4f1e783fa1 feat(asyncify): migrate sleeptime agent creation (#2361) 2025-05-23 14:38:37 -07:00
cthomas
d1b0756657 feat(asyncify): list model stragglers (#2362) 2025-05-23 00:42:05 -07:00
cthomas
6c1b189080 feat: add db tracing to otel (#2337) 2025-05-22 13:47:49 -07:00
cthomas
37e2548ac1 fix: request logging json error (#2309) 2025-05-21 12:04:12 -07:00
Matthew Zhou
26ae9c4502 feat: Add tavily search builtin tool (#2257) 2025-05-19 16:38:11 -07:00
Andy Li
a78abc610e feat: track llm provider traces and tracking steps in async agent loop (#2219) 2025-05-19 15:50:56 -07:00