OpenTelemetry
Routeplane is OpenTelemetry-native. Every request you send through the router becomes a trace — the full lifecycle from ingress through routing, each upstream attempt (including failovers), and settlement — plus a set of metrics, all following the OpenTelemetry GenAI semantic conventions and pushed over OTLP to any backend you already run.
Everything on this page is open-source and runs entirely on your own infrastructure — there’s no Routeplane telemetry endpoint in the middle. It’s off until you point it somewhere, and it excludes message content by default. If you’d rather not run a collector, Routeplane Cloud gives you a hosted request view with nothing to operate.
How a trace looks
Section titled “How a trace looks”Each request produces a span tree:
HTTP SERVER POST /v1/chat/completions (ingress)└─ chat (INTERNAL, inbound — whole request lifetime) ├─ route (INTERNAL — routing decision) ├─ chat (CLIENT — upstream attempt #1, gen_ai.* attributes) ├─ chat (CLIENT — failover attempt #2) └─ settle (INTERNAL — settlement summary)There is one GenAI generation per request — the inbound chat span. Each
upstream attempt is a separate CLIENT span, so a failover chain shows every
provider it tried, in order, with the latency and outcome of each hop. Routeplane
extracts inbound W3C trace context and injects an outbound traceparent, so
router spans stitch into the parent trace from your agent or gateway. Because
each attempt is its own span, traces are where failover and routing behavior
become legible: which provider was tried first, why it fell through, and where
latency went across the chain.
Span attributes
Section titled “Span attributes”| Attribute | Description |
|---|---|
gen_ai.provider.name |
Upstream provider for the hop (e.g. openai, anthropic) |
gen_ai.response.model |
Model that actually served the response |
gen_ai.token.type |
input / output, on token measurements |
outcome |
Final disposition of the request |
api_key_id, user_id |
Caller attribution (cardinality-capped) |
account_label |
Logical account/tenant label |
Enable export
Section titled “Enable export”Add an otel block under the routeplane-observe plugin and give it an endpoint.
That alone turns export on:
plugins: routeplane-observe: otel: endpoint: "http://localhost:4318" # your OTLP endpoint service_name: "routeplane"Keep secrets out of the committed file — use ${VAR} references for any auth
headers, resolved from the environment at load time:
plugins: routeplane-observe: otel: endpoint: "https://api.honeycomb.io" headers: x-honeycomb-team: "${HONEYCOMB_API_KEY}"Every field has an environment-variable override, so you can configure export
without touching the file — useful in containers, where you can run with no
otel block at all:
| Env var | Sets |
|---|---|
OTEL_EXPORTER_OTLP_ENDPOINT |
OTLP endpoint URL |
OTEL_EXPORTER_OTLP_HEADERS |
Auth headers (comma-separated k=v) |
OTEL_SERVICE_NAME |
Resource service name |
OTEL_RESOURCE_ATTRIBUTES |
Extra resource attributes (comma-separated k=v) |
OTEL_TRACES_SAMPLER |
Sampler kind |
OTEL_TRACES_SAMPLER_ARG |
Sampler argument (e.g. ratio) |
ROUTEPLANE_OBSERVE_CONTENT_CAPTURE |
Content capture mode |
Metrics
Section titled “Metrics”Alongside traces, metrics are exported over OTLP on an interval (default 60s):
| Metric | Type | Measures |
|---|---|---|
routeplane.requests |
Counter | Requests processed |
gen_ai.client.operation.duration |
Histogram | Request latency |
gen_ai.client.token.usage |
Histogram | Token counts (by gen_ai.token.type) |
routeplane.errors |
Counter | Errors |
routeplane.stream_parts |
Counter | Streaming parts emitted |
Dimensions include gen_ai.provider.name, gen_ai.response.model, outcome,
account_label, and the caller identifiers. To keep metric cardinality bounded
on shared deployments, api_key_id and user_id are capped (defaults: 1024 and
256 distinct values); beyond the cap, values collapse to an overflow bucket.
Backend recipes
Section titled “Backend recipes”Each block is the plugins.routeplane-observe.otel config for a common backend.
OpenTelemetry Collector
Section titled “OpenTelemetry Collector”Send everything to a local or in-cluster Collector and let it fan out to your real backends (this is also the path to a Prometheus-based stack — the Collector’s Prometheus exporter bridges the gap, since Routeplane has no scrape endpoint):
otel: endpoint: "http://otel-collector:4318" service_name: "routeplane" resource_attributes: deployment.environment: "prod"Honeycomb
Section titled “Honeycomb”otel: endpoint: "https://api.honeycomb.io" service_name: "routeplane" headers: x-honeycomb-team: "${HONEYCOMB_API_KEY}"Grafana Cloud / Tempo
Section titled “Grafana Cloud / Tempo”Grafana Cloud’s OTLP gateway uses basic auth (instance ID + API token, base64 encoded). For self-hosted Tempo, point at its OTLP port and drop the header.
otel: endpoint: "https://otlp-gateway-<region>.grafana.net/otlp" service_name: "routeplane" headers: Authorization: "Basic ${GRAFANA_OTLP_TOKEN}"Datadog
Section titled “Datadog”Datadog ingests OTLP through the Datadog Agent rather than a public OTLP URL — run the Agent with OTLP receiving enabled and point Routeplane at it:
otel: endpoint: "http://datadog-agent:4318" service_name: "routeplane" resource_attributes: deployment.environment: "prod"Tune sampling
Section titled “Tune sampling”By default Routeplane respects the inbound trace decision and otherwise samples
everything (parentbased_always_on). On high throughput, sample a fraction
instead:
otel: endpoint: "http://otel-collector:4318" sampler: "parentbased_traceidratio" sampler_arg: 0.1 # keep 10% of root tracessampler |
Behavior |
|---|---|
always_on |
Sample every trace |
always_off |
Sample nothing |
traceidratio |
Sample a fraction (sampler_arg), ignoring parent |
parentbased_always_on |
Follow parent; sample if no parent (default) |
parentbased_always_off |
Follow parent; drop if no parent |
parentbased_traceidratio |
Follow parent; otherwise sample sampler_arg |
parentbased_* variants honor the upstream decision, so a trace your agent
started won’t be half-sampled at the router. The metrics export interval and the
trace batch queue are tunable separately under metrics and traces.batch if
you need to trade freshness for overhead.
Content capture
Section titled “Content capture”Prompt and response content is excluded by default (content_capture: off).
Turn it on only when you need prompt and response bodies on the spans for
debugging:
otel: content_capture: "full" # off (default) | fullVerify
Section titled “Verify”Reload (or restart) the router, then ask the running daemon what it’s doing:
routeplane reload # pick up config changes without dropping connectionsrouteplane observe status # endpoint, sampler, cardinality, in-flight spansrouteplane observe status --jsonIf it reports stopped, the exporter isn’t wired — check that the otel block
has an endpoint (or that OTEL_EXPORTER_OTLP_ENDPOINT is set) and that the
binary was built with an OTLP transport feature. Then send a request through the
router and confirm the trace lands in your backend; you should see one inbound
chat span per request with a CLIENT child for each upstream attempt.
Next steps
Section titled “Next steps”Cloud Activity (hosted)
Section titled “Cloud Activity (hosted)”The open-source OpenTelemetry export runs on your own backend. Routeplane Cloud gives you the hosted alternative: every /v1 request is traced into an Activity view server-side — no collector, no warehouse, nothing to run. Content (prompts and responses) is never stored.
The Activity dashboard
Section titled “The Activity dashboard”Sign in to cloud.routeplane.app and open Activity. It opens on three KPI cards over a window you pick — 1 day, 1 week, 1 month, or all time:
| KPI | What it measures |
|---|---|
| Spend | Total USD charged over the window |
| Requests | Number of requests over the window |
| Tokens | Prompt + completion tokens over the window |
Every figure is scoped to the active workspace (namespace), so a dashboard always reflects the workspace you’re signed into.
The request log
Section titled “The request log”Below the KPIs, the request log lists every /v1 request, newest first. Each row is a per-request trace record:
| Column | Detail |
|---|---|
| Time | When the request landed |
| Model | The model id served, with a stream marker for streamed calls |
| Provider | The upstream provider that served it |
| Tokens | Prompt + completion total |
| Cost | Final charge in USD |
| Latency | End-to-end latency |
| Source | Funding source (credit balance, BYOK, MPP session) |
| Status | Succeeded, error, denied, cancelled |
Each record also carries the routing profile used (balanced, cost, latency, throughput) and the gated capabilities exercised (e.g. structured_outputs) — so a request that failed over or hit a budget is legible without leaving the dashboard.
Usage attribution & the API
Section titled “Usage attribution & the API”Everything in the dashboard is also available over the management API, scoped per workspace and gated by the usage:read scope:
- Aggregate usage — spend, token counts, request count, and a per-capability breakdown over a
[from, to)window. - Request history — the paginated request log, including routing profile and capabilities used.
These are the same routeplane cloud usage and routeplane cloud requests commands you run from the CLI. A hosted API reference for the usage and requests endpoints will ship alongside Routeplane Cloud.
Deep traces
Section titled “Deep traces”Cloud stores per-request receipts, not OpenTelemetry span waterfalls. When you need the full span tree — the ingress span, the routing decision, and a CLIENT span per upstream attempt — that lives in your own OTLP collector. Wire it up once with the open-source OpenTelemetry export and the Activity view links out to it.