OpenTelemetry

OpenTelemetry is a Framework and Toolkit that can generate, collect and export telemetry data such as traces, metrics and logs.

Goals

To make the system observable is our main goal. The system is the whole stack from frontend, backend to the database as well as services our application consumes (keycloak, nextcloud, etc.). We want to see how different services interact with each other, notice errors when they occur and see where our applications need to much time to respond. We want to trace a users action through our whole system to make the application as easy and comfortable to use. By providing our developers an easy way to observe such requests they spend less time analyzing where and why a problem occurs.

Generated data has to be exported, which can be done via including the framework in the applications where you have control over the source code. Other applications possibly can be configured to export their data too.

The exported data then has to be collected and can be made available via observability backends.

This can be archived by instrumenting the applications.

Observability Backends

Observability backends make the data observable by providing ways to visualize and alert on logs, traces and metrics events. For our use case we would prefer an all-in-one solution, which can handle every type of telemetry data and can be self-hosted. Another thing to keep in mind is that we want the used software to have an active developing community while also being open-source. Ideally we would also have some experienced users in our team or have an observability backend that is easy to use, setup and maintain.

Excluding paid and tools that can't be self-hosted it comes down to:

ApplicationHandles LOGSHandles TRACESHandles METRICSProsCons
SignozYESYESYES-Easy setup (docker compose) -active community-not a mature software yet (<v1.0.0)
Grafana TempoYESYESYES-depends on object storage(s3, GCS,..)
JaegerYESYESYES-focus on kubernetes
PrometheusYESYESYES-focus on kubernetes
CheckMKNONOYES-already used by us-doesn't provide everything we need -needs update to support it (still beta)