Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tracectrl.ai/llms.txt

Use this file to discover all available pages before exploring further.

System Overview

TraceCtrl has five layers:
  1. Python SDK (in your agent) — emits OpenTelemetry spans enriched with tracectrl.* security attributes, plus optional in-process guardrails.
  2. Protector Plus (optional external HTTP firewall) — called from tracectrl.protector, emits guardrail evaluation spans.
  3. OTel Collector — receives OTLP spans and writes them to ClickHouse.
  4. Intelligence Engine — FastAPI service that runs the pipeline and serves the REST API.
  5. Dashboard — React SPA that visualizes agents, traces, alerts, guardrails, and attack paths.
TraceCtrl architecture: a Python agent process runs the SDK (Framework Instrumentor + TraceCtrl SDK core + Guardrails), exports OTLP spans on :4317/:4318 to the OTel Collector. The Collector forwards to ClickHouse (:8123/:9000) which holds otel_traces, agent_inventory, topology edges, attack paths, guardrail registry + violations, scan results, and the protector_plus_config. The Intelligence Engine (FastAPI :8000) runs a 60-second pipeline that builds inventory + topology, ingests guardrail spans, and computes the attack graph; it serves /api/v1/* REST + SSE. The React Dashboard (:3000) renders Monitor (agents/sessions/topology), Security (alerts/guardrails/scan/risk/attacks), and Configure (settings). The Guardrails block also talks HTTP to an external Protector Plus firewall (optional); both SDK and Protector Plus violations flow to the same engine pipeline.

Component Details

Python SDK

The SDK wraps OpenInference framework instrumentors with a TraceCtrlSpanProcessor that enriches every span with security attributes. It uses OpenTelemetry as the transport — no custom protocols, no vendor lock-in. Key modules:
  • tracectrl.configconfigure() sets up the TracerProvider with OTLP gRPC exporter.
  • tracectrl.processorTraceCtrlSpanProcessor adds agent identity, tool categories, session IDs, ingress markers.
  • tracectrl.inference — Classifies tools into risk categories.
  • tracectrl.session — Session ID management via Python contextvars.
  • tracectrl.context — W3C traceparent propagation for cross-service agents.
  • tracectrl.guardrails — In-process LLM-judge guardrails. Emits tracectrl.guardrail.registered (on registration) and tracectrl.guardrail.evaluation (per evaluation) spans.
  • tracectrl.protector — TraceCtrl Guards integration with external Protector Plus HTTP firewall. Emits the same guardrail span types, tagged with tracectrl.guardrail.provider = "protector_plus".

Protector Plus (optional)

An external HTTP firewall called from tracectrl.protector. Exposes seven sub-guardrails (llm, keyword, regex, pii, vector, content_moderation, system_prompt_protection) via /apikey/api/protectorplus/v1/input-check and .../output-check. The SDK calls it fire-and-forget; flagged sub-guardrails are emitted as tracectrl.guardrail.evaluation spans (decision=fail) for the engine to ingest. Config (endpoint URL, API key, enabled guardrails) is persisted in protector_plus_config and configured from the Settings page.

OTel Collector

A standard OpenTelemetry Collector configured to receive OTLP spans on gRPC (:4317) and HTTP (:4318), then export them to ClickHouse via the clickhouseexporter. Configuration lives in config/otel-collector.yaml.

Intelligence Engine

A FastAPI application that:
  1. Runs the pipeline every PIPELINE_INTERVAL_SECONDS (default 60). Each tick re-reads all spans, refreshes the agent inventory, topology, guardrail registry, guardrail violations, and the attack-graph analysis. Idempotency comes from ClickHouse ReplacingMergeTree rather than a watermark.
  2. Serves a REST API under /api/v1 — system, topology, sessions, agents, risk, scan, violations, guardrails.
  3. Owns the ClickHouse schemaensure_schema() runs the CREATE/ALTER statements at startup.

Dashboard

A React + Vite + TypeScript SPA. The sidebar groups pages into three sections (Monitor, Security, Configure) and the route table lives in ui/src/App.tsx:145-158. See the Dashboard page for the per-page reference.

Data Pipeline (one tick)

1

Fetch spans

Read all spans from otel_traces.
2

Inventory

Upsert agents with cumulative observation/run counts and tool lists.
3

Topology

Upsert agent-to-agent and agent-to-tool edges with confidence scoring.
4

Violations

Ingest tracectrl.guardrail.evaluation spans with decision in ('fail','error') into guardrail_violations.
5

Guardrail registry

Ingest tracectrl.guardrail.registered spans into guardrail_registry; flip health to error when the last hour shows error-decision violations.
6

Attack graph

Evaluate TAGAAI rules, generate attack paths, score risk, write attack_paths / agent_risk_scores / system_risk.

ClickHouse Tables

TableEnginePurpose
otel_tracesAuto-created by CollectorRaw OpenTelemetry spans
agent_inventoryReplacingMergeTreeDeduplicated agent records
topology_agent_edgesReplacingMergeTreeAgent-to-agent connections
topology_tool_edgesReplacingMergeTreeAgent-to-tool connections
pipeline_stateReplacingMergeTreePipeline state (legacy watermark key)
attack_pathsReplacingMergeTreeTAGAAI-detected attack paths
agent_risk_scoresReplacingMergeTreePer-agent risk
system_riskReplacingMergeTreeSystem-wide risk summary
scan_results / scan_topology / scan_runsMergeTree / ReplacingMergeTreeStatic OpenClaw scan output
guardrail_violationsReplacingMergeTreeGuardrail evaluation failures (judge-LLM + Protector Plus)
guardrail_registryReplacingMergeTreeRegistered guardrails (per agent)
protector_plus_configReplacingMergeTreeSingle-row Protector Plus config
ReplacingMergeTree deduplicates rows during background merges. Always query with FINAL for correct results: SELECT * FROM agent_inventory FINAL.