Observability for Everyone: Part VII Tracing the Invisible Path with Tempo

Gaurav Nagarkoti
Searce
Published in
6 min readApr 12, 2024

--

In the intricate web of distributed systems, where interactions are like threads woven tightly together, uncovering the actual path of a request is akin to connecting the dots in a sprawling tapestry. Enter Tempo, Otel Collector, and OpenTelemetry — our guides on this journey of tracing the invisible path through the labyrinth of distributed architectures. Let’s embark on an adventure of discovery, tracing the threads that bind our digital world together.

Imagine each interaction within a distributed system as a dot — a point in space and time, seemingly disconnected from the others. However, with Tempo, Otel Collector, and OpenTelemetry, these dots are not isolated entities but rather pieces of a larger puzzle waiting to be assembled. As we collect telemetry data and trace spans across service boundaries, we gradually connect the dots, revealing the intricate path that a request takes as it traverses through the system.

Guided by 3 Musketeers

Just as skilled cartographers chart their course through unexplored territories, Tempo, Otel Collector, and OpenTelemetry serve as our navigators in the realm of distributed tracing. With Tempo at the helm, Otel Collector diligently collects telemetry data from every corner of our infrastructure, while OpenTelemetry provides the instrumentation needed to capture the nuances of each interaction. Together, they guide us through the twists and turns of distributed architectures, helping us unravel the path that lies hidden beneath the surface.

As we collect more data and trace additional spans, patterns begin to emerge, and the once-disparate dots start to form a cohesive picture. With each dot connected, we gain a deeper understanding of the flow of requests, the dependencies between services, and the performance characteristics of our distributed systems. Through meticulous analysis and interpretation, we unravel the actual path — a clear, coherent trajectory through the tightly woven paths of our digital landscape.

Exploring Tempo’s Features

Distributed Tracing
Tempo empowers you to trace requests across distributed systems seamlessly, providing insights into their journey and performance.

Scalability
Designed to grow with your needs, Tempo efficiently manages large volumes of trace data, ensuring smooth operations even as your system scales.

Storage Efficiency
Optimizes storage resources, efficiently indexing and compressing trace data to minimize costs without compromising performance.

OpenTelemetry Integration
Effortlessly integrates with OpenTelemetry instrumentation libraries, simplifying the capture of trace data from your applications.

Querying and Analysis
With Tempo’s intuitive querying and analysis tools, you can easily filter, group, and visualize trace data to understand system behavior and identify issues.

Seamless Integration with Grafana
Tempo seamlessly integrates with Grafana, allowing you to visualize trace data alongside other metrics for holistic observability and analysis.

Interoperability with Logs
Tempo supports trace ID-based log correlation, enabling seamless integration with logs for contextual analysis and troubleshooting.

Revealing the Benefits

Enhanced Observability
Enhances observability by providing a clear view of request flows and system behavior across your distributed architecture.

Improved Troubleshooting
Simplifies troubleshooting by enabling quick identification of root causes for performance issues or errors, reducing downtime.

Optimized Performance
By analyzing trace data, Tempo helps optimize system performance by identifying bottlenecks and areas for improvement.

Collaborative Analysis
Tempo promotes collaboration among teams by facilitating the sharing of trace data and insights, enabling collective problem-solving and knowledge sharing.

Future-Proof Tracing
With support for emerging technologies and standards, Tempo ensures that your tracing capabilities remain adaptable and relevant, safeguarding your investments for the future.

Today’s Objective

Now, we will delve into the details of deploying Tempo Open Source on Kubernetes, taking advantage of the capabilities offered by Google Kubernetes Engine (GKE) on the Google Cloud Platform.

Our deployment approach will focus on making the most of the Tempo Helm chart & incorporate it as a datasource into Grafana, allowing us to visualize Tempo Traces effectively.

Step-by-Step Guide

Pre-Requisites:

  • Provisioning a Private GKE Cluster
  • Provisioning a Private Cloud SQL
  • Gaining Access to the Private GKE Cluster and Namespace Setup
  • Setting Up Grafana

Reference: observability-for-everyone-part-iii-visualization

Cloning the Tempo Repository

To acquire the essential Tempo components, we clone the Tempo repository.

Repository
https://github.com/grafana/helm-charts/tree/main/charts/tempo-distributed

Configuration Changes in the values.yaml File

Creating Separate K8s SA for Tempo
We will start with establishing a dedicated K8s service account for Tempo. We will be using previously discussed Workload Identity, to associate and annotate this K8s service account with a GCP IAM account. By doing so, we’ll equip Tempo with the necessary permissions to seamlessly interact with Google Cloud Platform resources.

serviceAccount:
# -- Specifies whether a ServiceAccount should be created
create: true
# -- The name of the ServiceAccount to use.
# If not set and create is true, a name is generated using the fullname template
name: tempo-sa
# -- Image pull secrets for the service account
imagePullSecrets: []
# -- Annotations for the service account
annotations:
iam.gke.io/gcp-service-account: gke-poc-pod-sa@project-id.iam.gserviceaccount.com
# -- Set this toggle to false to opt out of automounting API credentials for the service account
automountServiceAccountToken: true

GCS as Long Term Storage
Set up the name of the bucket employed for storing data over the long term.

storage:
trace:
# -- The supported storage backends are gcs, s3 and azure, as specified in https://grafana.com/docs/tempo/latest/configuration/#storage
backend: gcs
gcs:
bucket_name: test-bucket
# Settings for the Admin client storage backend and buckets. Only valid is enterprise.enabled is true.
admin:
# -- The supported storage backends are gcs, s3 and azure, as specified in https://grafana.com/docs/enterprise-traces/latest/config/reference/#admin_client_config
backend: gcs

Mention the Prometheus remote write URL

storage:
path: /var/tempo/wal
wal:
remote_write_flush_deadline: 1m
# -- A list of remote write endpoints.
# -- https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write
remote_write:
- url: http://prometheus-server.namespace.svc:80/api/v1/write
send_exemplars: true

Enable otlp to allow Open Telemetry HTTP traces

otlp:
http:
# -- Enable Tempo to ingest Open Telemetry HTTP traces
enabled: true
# -- HTTP receiver advanced config
receiverConfig: {}

Installing the Tempo Helm Chart

With our configurations in place, we execute the command for installing the Tempo Helm chart. This process lays the foundation for the deployment of Prometheus in our private GKE cluster.

helm install tempo tempo-distributed-final -f tempo-distributed-final/values.yaml -n observability

Resource Validation

Post-installation, we meticulously validate whether all the resources required for Tempo are deployed correctly. This step ensures the integrity and functionality of our o11y infrastructure.

kubectl get all -n observability | grep tempo

Adding Tempo Datasource within Grafana

To integrate Tempo as a data source in Grafana, follow these steps:

  1. Add Tempo as a Data Source
    Start by adding Tempo as a data source within Grafana.
  2. Configure the Datasource URL
    In the datasource settings, specify the URL of the Tempo Query Frontend service
    Note : Traces won’t be generated unless OTEL is deployed alongside Xscope-org-ID and its corresponding value, which must be configured within OTEL.
  3. Include X-Scope-orgid Header
    Add the X-Scope-orgid header to the data source configuration, using the value that is configured in the OTEL.

Validating Trace Collection via Grafana UI

Now, we’ll choose the Tempo datasource, and with that in place, we’ll execute a simple command to verify the generation of traces within logs & its inter-operability.

With the deployment of Tempo, we complete our core observability stack, alongside Grafana, Mimir, Prometheus, and Loki. This comprehensive stack equips us with the tools necessary to monitor, analyze, and optimize the performance of our systems effectively.

In our upcoming blogs, we’ll continue our exploration of observability by setting up essential components of our monitoring infrastructure. We’ll cover deploying Promtail for log collection, configuring Otel Collector for telemetry data collection, instrumenting apps with OpenTelemetry, and setting up various Prometheus exporters. Stay tuned for practical guidance on integrating these tools into your observability stack, enabling comprehensive insights into your systems’ performance and behavior.

Author: Gaurav Nagarkoti

--

--