Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tracectrl.ai/llms.txt

Use this file to discover all available pages before exploring further.

End-to-End Flow

Your Agent → SDK → OTel Collector → ClickHouse → Engine Pipeline → REST API → Dashboard
1

Agent emits spans

The SDK’s framework instrumentor captures every LLM call, tool invocation, and agent action as an OpenTelemetry span. The TraceCtrlSpanProcessor enriches each span with tracectrl.* security attributes (agent id, tool category, session id, ingress markers).
2

Spans exported via OTLP

The BatchSpanProcessor batches spans and exports them via OTLP gRPC to the OTel Collector at :4317. The batch flush interval is TRACECTRL_BATCH_DELAY_MS (default 1000).
3

Collector routes to ClickHouse

The OTel Collector receives spans on :4317 (gRPC) and :4318 (HTTP) and exports them to ClickHouse via the clickhouseexporter. Spans land in the otel_traces table with SpanAttributes stored as a Map(String, String).
4

Engine pipeline processes spans

Every PIPELINE_INTERVAL_SECONDS (default 60), the engine re-reads all spans, refreshes the agent inventory, topology edges, guardrail registry, guardrail violations, and attack graph. ReplacingMergeTree deduplicates writes — no watermark required on the main path.
5

API serves processed data

The REST API under /api/v1 exposes system, topology, sessions, agents, risk, scan, violations, and guardrails routes. The dashboard consumes these endpoints; the violations route also exposes an SSE stream.

Protector Plus Call Path

When TraceCtrl Guards is enabled, guardrail evaluations flow through the same span pipeline as everything else:
1

SDK call site

User code calls tracectrl.check_input(msg) or tracectrl.check_output(text) inside a with tracectrl.guard(): block.
2

Background POST to Protector Plus

A background thread POSTs to the Protector Plus endpoint (/apikey/api/protectorplus/v1/input-check or .../output-check). The user call returns IMMEDIATELY with a flagged=False verdict stub — synchronous callers can verdict.wait(timeout=...).
3

SDK emits guardrail spans

On the POST response, the background thread emits one tracectrl.guardrail.evaluation span per flagged sub-guardrail with decision='fail' and tracectrl.guardrail.provider='protector_plus', parented to the span that was active at the call site.
4

Engine ingests violations

On the next pipeline tick, update_violations() picks up the failing eval spans from otel_traces and inserts them into guardrail_violations.
5

Alerts page surfaces them

The /alerts page renders the feed (with an SSE-driven unread counter in the sidebar); the /guardrails page shows each Protector Plus sub-guardrail in the registry, registered via separate tracectrl.guardrail.registered spans emitted at guard() entry.

Verifying Each Stage

1. Are spans being exported?

Set TRACECTRL_FAIL_SILENTLY=false temporarily to see exporter errors:
TRACECTRL_FAIL_SILENTLY=false python your_agent.py

2. Is the Collector receiving spans?

docker compose logs otel-collector 2>&1 | grep -i "traces"

3. Are spans in ClickHouse?

docker exec -it $(docker compose ps -q clickhouse) clickhouse-client \
  --query "SELECT count() FROM tracectrl.otel_traces"

4. Is the pipeline running?

docker compose logs tracectrl-engine 2>&1 | grep "Pipeline run"

5. Does the API return data?

curl http://localhost:8000/api/v1/topology/graph | python -m json.tool

6. Are guardrail violations being ingested?

docker exec -it $(docker compose ps -q clickhouse) clickhouse-client \
  --query "SELECT count() FROM tracectrl.guardrail_violations FINAL"

Latency

StageTypical Latency
SDK span processing< 1ms per span
Batch export to Collector1 second (configurable via TRACECTRL_BATCH_DELAY_MS, default 1000)
Collector → ClickHouse< 100ms
Pipeline processing60 second cycle (configurable via PIPELINE_INTERVAL_SECONDS, default 60)
API response< 50ms
End-to-end: from agent action to dashboard visibility is typically 60-65 seconds, dominated by the pipeline interval.
For faster feedback during development, set PIPELINE_INTERVAL_SECONDS=10 in your .env file.