Data Flow

End-to-End Flow

Your Agent → SDK → OTel Collector → ClickHouse → Engine Pipeline → REST API → Dashboard

Agent emits spans

The SDK’s framework instrumentor captures every LLM call, tool invocation, and agent action as an OpenTelemetry span. The TraceCtrlSpanProcessor enriches each span with tracectrl.* security attributes (agent id, tool category, session id, ingress markers).

Spans exported via OTLP

The BatchSpanProcessor batches spans and exports them via OTLP gRPC to the OTel Collector at :4317. The batch flush interval is TRACECTRL_BATCH_DELAY_MS (default 1000).

Collector routes to ClickHouse

The OTel Collector receives spans on :4317 (gRPC) and :4318 (HTTP) and exports them to ClickHouse via the clickhouseexporter. Spans land in the otel_traces table with SpanAttributes stored as a Map(String, String).

Engine pipeline processes spans

Every PIPELINE_INTERVAL_SECONDS (default 60), the engine re-reads all spans, refreshes the agent inventory, topology edges, guardrail registry, guardrail violations, and attack graph. ReplacingMergeTree deduplicates writes — no watermark required on the main path.

API serves processed data

The REST API under /api/v1 exposes system, topology, sessions, agents, risk, scan, violations, and guardrails routes. The dashboard consumes these endpoints; the violations route also exposes an SSE stream.

Protector Plus Call Path

When TraceCtrl Guards is enabled, guardrail evaluations flow through the same span pipeline as everything else:

SDK call site

User code calls tracectrl.check_input(msg) or tracectrl.check_output(text) inside a with tracectrl.guard(): block.

Background POST to Protector Plus

A background thread POSTs to the Protector Plus endpoint (/apikey/api/protectorplus/v1/input-check or .../output-check). The user call returns IMMEDIATELY with a flagged=False verdict stub — synchronous callers can verdict.wait(timeout=...).

SDK emits guardrail spans

On the POST response, the background thread emits one tracectrl.guardrail.evaluation span per flagged sub-guardrail with decision='fail' and tracectrl.guardrail.provider='protector_plus', parented to the span that was active at the call site.

Engine ingests violations

On the next pipeline tick, update_violations() picks up the failing eval spans from otel_traces and inserts them into guardrail_violations.

Alerts page surfaces them

The /alerts page renders the feed (with an SSE-driven unread counter in the sidebar); the /guardrails page shows each Protector Plus sub-guardrail in the registry, registered via separate tracectrl.guardrail.registered spans emitted at guard() entry.

Verifying Each Stage

1. Are spans being exported?

Set TRACECTRL_FAIL_SILENTLY=false temporarily to see exporter errors:

TRACECTRL_FAIL_SILENTLY=false python your_agent.py

2. Is the Collector receiving spans?

docker compose logs otel-collector 2>&1 | grep -i "traces"

3. Are spans in ClickHouse?

docker exec -it $(docker compose ps -q clickhouse) clickhouse-client \
  --query "SELECT count() FROM tracectrl.otel_traces"

4. Is the pipeline running?

docker compose logs tracectrl-engine 2>&1 | grep "Pipeline run"

5. Does the API return data?

curl http://localhost:8000/api/v1/topology/graph | python -m json.tool

6. Are guardrail violations being ingested?

docker exec -it $(docker compose ps -q clickhouse) clickhouse-client \
  --query "SELECT count() FROM tracectrl.guardrail_violations FINAL"

Latency

Stage	Typical Latency
SDK span processing	< 1ms per span
Batch export to Collector	1 second (configurable via `TRACECTRL_BATCH_DELAY_MS`, default `1000`)
Collector → ClickHouse	< 100ms
Pipeline processing	60 second cycle (configurable via `PIPELINE_INTERVAL_SECONDS`, default `60`)
API response	< 50ms

End-to-end: from agent action to dashboard visibility is typically 60-65 seconds, dominated by the pipeline interval.

For faster feedback during development, set PIPELINE_INTERVAL_SECONDS=10 in your .env file.

Getting Started

SDK

Framework Instrumentors

Security

Platform

End-to-End Flow

Protector Plus Call Path

Verifying Each Stage

1. Are spans being exported?

2. Is the Collector receiving spans?

3. Are spans in ClickHouse?

4. Is the pipeline running?

5. Does the API return data?

6. Are guardrail violations being ingested?

Latency

Getting Started

SDK

Framework Instrumentors

Security

Platform

Documentation Index

​End-to-End Flow

​Protector Plus Call Path

​Verifying Each Stage

​1. Are spans being exported?

​2. Is the Collector receiving spans?

​3. Are spans in ClickHouse?

​4. Is the pipeline running?

​5. Does the API return data?

​6. Are guardrail violations being ingested?

​Latency

End-to-End Flow

Protector Plus Call Path

Verifying Each Stage

1. Are spans being exported?

2. Is the Collector receiving spans?

3. Are spans in ClickHouse?

4. Is the pipeline running?

5. Does the API return data?

6. Are guardrail violations being ingested?

Latency