OpenTelemetry

Routeplane is OpenTelemetry-native. Every request you send through the router becomes a trace — the full lifecycle from ingress through routing, each upstream attempt (including failovers), and settlement — plus a set of metrics, all following the OpenTelemetry GenAI semantic conventions and pushed over OTLP to any backend you already run.

Everything on this page is open-source and runs entirely on your own infrastructure — there’s no Routeplane telemetry endpoint in the middle. It’s off until you point it somewhere, and it excludes message content by default. If you’d rather not run a collector, Routeplane Cloud gives you a hosted request view with nothing to operate.

How a trace looks

Each request produces a span tree:

HTTP SERVER  POST /v1/chat/completions        (ingress)
└─ chat      (INTERNAL, inbound — whole request lifetime)
   ├─ route  (INTERNAL — routing decision)
   ├─ chat   (CLIENT — upstream attempt #1, gen_ai.* attributes)
   ├─ chat   (CLIENT — failover attempt #2)
   └─ settle (INTERNAL — settlement summary)

There is one GenAI generation per request — the inbound chat span. Each upstream attempt is a separate CLIENT span, so a failover chain shows every provider it tried, in order, with the latency and outcome of each hop. Routeplane extracts inbound W3C trace context and injects an outbound traceparent, so router spans stitch into the parent trace from your agent or gateway. Because each attempt is its own span, traces are where failover and routing behavior become legible: which provider was tried first, why it fell through, and where latency went across the chain.

Span attributes

Attribute	Description
`gen_ai.provider.name`	Upstream provider for the hop (e.g. `openai`, `anthropic`)
`gen_ai.response.model`	Model that actually served the response
`gen_ai.token.type`	`input` / `output`, on token measurements
`outcome`	Final disposition of the request
`api_key_id`, `user_id`	Caller attribution (cardinality-capped)
`account_label`	Logical account/tenant label

Enable export

Add an otel block under the routeplane-observe plugin and give it an endpoint. That alone turns export on:

plugins:
  routeplane-observe:
    otel:
      endpoint: "http://localhost:4318"   # your OTLP endpoint
      service_name: "routeplane"

Keep secrets out of the committed file — use ${VAR} references for any auth headers, resolved from the environment at load time:

plugins:
  routeplane-observe:
    otel:
      endpoint: "https://api.honeycomb.io"
      headers:
        x-honeycomb-team: "${HONEYCOMB_API_KEY}"

Every field has an environment-variable override, so you can configure export without touching the file — useful in containers, where you can run with no otel block at all:

Env var	Sets
`OTEL_EXPORTER_OTLP_ENDPOINT`	OTLP endpoint URL
`OTEL_EXPORTER_OTLP_HEADERS`	Auth headers (comma-separated `k=v`)
`OTEL_SERVICE_NAME`	Resource service name
`OTEL_RESOURCE_ATTRIBUTES`	Extra resource attributes (comma-separated `k=v`)
`OTEL_TRACES_SAMPLER`	Sampler kind
`OTEL_TRACES_SAMPLER_ARG`	Sampler argument (e.g. ratio)
`ROUTEPLANE_OBSERVE_CONTENT_CAPTURE`	Content capture mode

The OTLP **transport** is selected when the binary is built: `otel-http` (OTLP/HTTP + protobuf, the default) or `otel-grpc` (OTLP/gRPC). The configuration on this page is identical for both — you only care about the transport if your backend speaks one and not the other.

Metrics

Alongside traces, metrics are exported over OTLP on an interval (default 60s):

Metric	Type	Measures
`routeplane.requests`	Counter	Requests processed
`gen_ai.client.operation.duration`	Histogram	Request latency
`gen_ai.client.token.usage`	Histogram	Token counts (by `gen_ai.token.type`)
`routeplane.errors`	Counter	Errors
`routeplane.stream_parts`	Counter	Streaming parts emitted

Dimensions include gen_ai.provider.name, gen_ai.response.model, outcome, account_label, and the caller identifiers. To keep metric cardinality bounded on shared deployments, api_key_id and user_id are capped (defaults: 1024 and 256 distinct values); beyond the cap, values collapse to an overflow bucket.

There is **no Prometheus scrape endpoint**. `GET /metrics` is retired — metrics are pushed via OTLP only. If your stack is Prometheus-based, ingest through an OpenTelemetry Collector with the Prometheus exporter.

Backend recipes

Each block is the plugins.routeplane-observe.otel config for a common backend.

OpenTelemetry Collector

Send everything to a local or in-cluster Collector and let it fan out to your real backends (this is also the path to a Prometheus-based stack — the Collector’s Prometheus exporter bridges the gap, since Routeplane has no scrape endpoint):

otel:
  endpoint: "http://otel-collector:4318"
  service_name: "routeplane"
  resource_attributes:
    deployment.environment: "prod"

Honeycomb

otel:
  endpoint: "https://api.honeycomb.io"
  service_name: "routeplane"
  headers:
    x-honeycomb-team: "${HONEYCOMB_API_KEY}"

Grafana Cloud / Tempo

Grafana Cloud’s OTLP gateway uses basic auth (instance ID + API token, base64 encoded). For self-hosted Tempo, point at its OTLP port and drop the header.

otel:
  endpoint: "https://otlp-gateway-<region>.grafana.net/otlp"
  service_name: "routeplane"
  headers:
    Authorization: "Basic ${GRAFANA_OTLP_TOKEN}"

Datadog

Datadog ingests OTLP through the Datadog Agent rather than a public OTLP URL — run the Agent with OTLP receiving enabled and point Routeplane at it:

otel:
  endpoint: "http://datadog-agent:4318"
  service_name: "routeplane"
  resource_attributes:
    deployment.environment: "prod"

Tune sampling

By default Routeplane respects the inbound trace decision and otherwise samples everything (parentbased_always_on). On high throughput, sample a fraction instead:

otel:
  endpoint: "http://otel-collector:4318"
  sampler: "parentbased_traceidratio"
  sampler_arg: 0.1                       # keep 10% of root traces

`sampler`	Behavior
`always_on`	Sample every trace
`always_off`	Sample nothing
`traceidratio`	Sample a fraction (`sampler_arg`), ignoring parent
`parentbased_always_on`	Follow parent; sample if no parent (default)
`parentbased_always_off`	Follow parent; drop if no parent
`parentbased_traceidratio`	Follow parent; otherwise sample `sampler_arg`

parentbased_* variants honor the upstream decision, so a trace your agent started won’t be half-sampled at the router. The metrics export interval and the trace batch queue are tunable separately under metrics and traces.batch if you need to trade freshness for overhead.

Content capture

Prompt and response content is excluded by default (content_capture: off). Turn it on only when you need prompt and response bodies on the spans for debugging:

otel:
  content_capture: "full"   # off (default) | full

`full` writes user prompts and model responses into your telemetry backend. That content then inherits the backend's access controls and retention. For shared or regulated environments, leave it `off` and capture content only in a scoped, short-lived debugging session.

Verify

Reload (or restart) the router, then ask the running daemon what it’s doing:

routeplane reload                  # pick up config changes without dropping connections
routeplane observe status          # endpoint, sampler, cardinality, in-flight spans
routeplane observe status --json

If it reports stopped, the exporter isn’t wired — check that the otel block has an endpoint (or that OTEL_EXPORTER_OTLP_ENDPOINT is set) and that the binary was built with an OTLP transport feature. Then send a request through the router and confirm the trace lands in your backend; you should see one inbound chat span per request with a CLIENT child for each upstream attempt.

Next steps

Cloud Activity (hosted)

**Routeplane Cloud is on the Phase D roadmap and not yet shipping.** This section describes the hosted Activity view that will be available once Cloud is live. The open-source [OpenTelemetry](#enable-export) export on the same page works today — point it at any OTLP backend you run.

The open-source OpenTelemetry export runs on your own backend. Routeplane Cloud gives you the hosted alternative: every /v1 request is traced into an Activity view server-side — no collector, no warehouse, nothing to run. Content (prompts and responses) is never stored.

The Activity dashboard

**Cloud not yet shipping — this section describes the future hosted Activity view.** For the open-source equivalent, configure [OTLP export](#enable-export) to your own backend.

Sign in to cloud.routeplane.app and open Activity. It opens on three KPI cards over a window you pick — 1 day, 1 week, 1 month, or all time:

KPI	What it measures
Spend	Total USD charged over the window
Requests	Number of requests over the window
Tokens	Prompt + completion tokens over the window

Every figure is scoped to the active workspace (namespace), so a dashboard always reflects the workspace you’re signed into.

The request log

Below the KPIs, the request log lists every /v1 request, newest first. Each row is a per-request trace record:

Column	Detail
Time	When the request landed
Model	The model id served, with a `stream` marker for streamed calls
Provider	The upstream provider that served it
Tokens	Prompt + completion total
Cost	Final charge in USD
Latency	End-to-end latency
Source	Funding source (credit balance, BYOK, MPP session)
Status	Succeeded, error, denied, cancelled

Each record also carries the routing profile used (balanced, cost, latency, throughput) and the gated capabilities exercised (e.g. structured_outputs) — so a request that failed over or hit a budget is legible without leaving the dashboard.

**Receipts, not bodies.** Cloud stores the request *record* — model, provider, tokens, cost, latency, status, routing profile — never the prompt or response content.

Usage attribution & the API

Everything in the dashboard is also available over the management API, scoped per workspace and gated by the usage:read scope:

Aggregate usage — spend, token counts, request count, and a per-capability breakdown over a [from, to) window.
Request history — the paginated request log, including routing profile and capabilities used.

These are the same routeplane cloud usage and routeplane cloud requests commands you run from the CLI. A hosted API reference for the usage and requests endpoints will ship alongside Routeplane Cloud.

Deep traces

Cloud stores per-request receipts, not OpenTelemetry span waterfalls. When you need the full span tree — the ingress span, the routing decision, and a CLIENT span per upstream attempt — that lives in your own OTLP collector. Wire it up once with the open-source OpenTelemetry export and the Activity view links out to it.