Configuring OpenTelemetry in IRIS

Article

Luis Angel Pére... · 17 hr ago 8m read

Open Exchange

#Monitoring #InterSystems IRIS #InterSystems IRIS for Health

Hello, dear members of our developer community!

In today's article, we're going to take a look at one of the latest remote monitoring features that have been added to the product for our IRIS instances. I'm talking about OpenTelemetry support.

What is OpenTelemetry?

OpenTelemetry is an open source framework that provides the necessary tools such as SDKs and standards to implement observability in systems and applications.

This observability extends to three types of data:

Traces : control of the flow of information that flows along the solutions by including traces that allow us to identify where they are passing and under what conditions.
Metrics : system status, performance, response latencies, resource usage, etc.
Logs : included in systems for better understanding of events that occur.

OpenTelemetry uses the open OTLP standard, which defines how all previously defined telemetry should be serialized and transported. This telemetry can be sent via HTTP or gRPC.

Open Telemetry con IRIS

InterSystems IRIS leverages the features available through the OpenTelemetry SDK to allow the export of all telemetry generated by the configured instance. Where does this telemetry come from?

Metrics : These come from the information available to us from the REST API /api/monitor ( you can see the official documentation for this API here ).
Logs : Messages that are recorded in the messages.log file and information that is stored in the audit database (if enabled).
Traces : Traces defined by the user within the applications developed on the instance.

Well let's see how we would proceed to configure OpenTelemetry in an IRIS instance

Configuring IRIS with OpenTelemetry

For our configuration example, I used a project I have uploaded to GitHub that runs on Docker, but it wouldn't be too complicated to configure on an on-premise instance. You can find that project uploaded to OpenExchange associated with this article.

Before showing the different configurations, let's explain what elements will be part of our "monitoring ecosystem":

InterSystems IRIS

The IRIS instance will be responsible for generating the telemetry data we need to monitor.

OpenTelemetry Collector

It is a tool provided by OpenTelemetry that is responsible for collecting telemetry data from different sources. In our example, it will only be IRIS, but we could add as many as we needed.

Prometheus

Open-source tool used for system monitoring and alert generation. This tool will be responsible for receiving the metrics accumulated by OpenTelemetry Collector.

Jaeger

Open-source platform for managing and monitoring traces of microservices-based systems.

Configuration in Docker

As I mentioned earlier, I used a Docker deployment to simplify the example as much as possible. Let's analyze the docker-compose.yml file piece by piece for a better understanding.

IRIS Docker image

  iris:
    init: true
    container_name: iris
    build:
      context: .
      dockerfile: iris/Dockerfile
    ports:
      - 52774:52773
      - 51774:1972
    volumes:
    - ./iris/shared:/iris-shared
    environment:
    - ISC_DATA_DIRECTORY=/iris-shared/durable
    - OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318
    command: --check-caps false --ISCAgent false

As you can see, I'm defining the image to use in a Dockerfile, which isn't particularly interesting, so I won't bother explaining it here. The interesting part of this configuration is defining an environment variable called OTEL_EXPORTER_OTLP_ENDPOINT , which is the URL where our OpenTelemetry Collector will wait for IRIS to send all the metrics, traces, and logs it has.

Once we deploy our IRIS instance, we'll need to configure it to output metrics and logs regularly. To do this, we'll access the Monitor configuration from the management portal:

As you can see, we can enable both Metrics and Logs sending. In the case of traces, these are not sent at intervals, but rather as soon as the "End" method of the %Trace.Provider class instance is invoked. I won't go into further detail, but you can check the official documentation here .

OpenTelemetry Collector Docker image

otel-collector:
    build:
      context: .
      dockerfile: open-telemetry/Dockerfile
    container_name: otel-collector
    command: ["--config=/otel-local-config.yml"]
    volumes:
      - ./open-telemetry/otel-collector-config.yml:/otel-local-config.yml
    ports:
      - 4317:4317      # OTLP gRPC receiver
      - 4318:4318    # OTLP HTTP receiver
      - 9464:9464      # Metrics
    depends_on:
      - iris

Here we have the OpenTelemetry Collector image. Again, I've defined a Dockerfile to determine where to get the image from (this wouldn't be necessary). As you can see, we're making three ports accessible:

4317: port for receiving metrics, traces and logs via gRPC.
4318: port for receiving metrics, traces and logs via HTTP.
9464: Open port for third-party tools to query information received by OpenTelemetry Collector.

We've also declared a configuration file, otel-local-config.yml (the name is modifiable). Let's take a look inside:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: "0.0.0.0:4317"
      http:
        endpoint: "0.0.0.0:4318"
exporters:
  otlp:
    endpoint: jaeger:4317
    tls:
      insecure: true
  prometheus:
    endpoint: "0.0.0.0:9464"
  debug: {}
service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlp]
    metrics:
      receivers: [otlp]
      exporters: [prometheus]
    logs:
      receivers: [otlp]
      exporters: [debug]

What are we looking at here? Quite simply, we have the following sections:

Data receivers , in our case we called it OTLP, in which we configure the IP and ports on which our receiver will wait for information from third-party systems.
Data exporters : where we're going to send or who's going to extract the data received by the OpenTelemetry Collector. For our example, we've used an exporter already included in OpenTelemetry, Prometheus, which will be responsible for obtaining the metrics from the local OpenTelemetry Collector instance on port 9464. In the case of Jaeger , we're directly using OpenTelemetry's ability to send data directly to an IP (Jaeger is the name of the instance within the network set up by Docker) that will correspond to our Jaeger instance.
Services , where we'll specify which of the components we've configured as receivers and exporters will be responsible for receiving and sending metrics, traces, and logs. As you can see, in our case, the OpenTelemetry Collector will be the receiver we'll use for everything, Prometheus will be the metrics receiver, and Jaeger, through OpenTelemetry Collector, will be the trace receiver.

Prometheus Docker image

Let's take a look at its configuration in Docker:

prometheus:
    build:
      context: .
      dockerfile: prometheus/Dockerfile
    container_name: prometheus
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - 9090:9090

As you can see it is very simple, we have our image defined again in a Dockerfile that only refers to the name of the image, a port 9090 where the web interface that we can access will be deployed and finally a configuration file called prometheus.yml

Let's see what this prometheus.yml file tells us:

global:
  scrape_interval: 15s
scrape_configs:
  - job_name: 'otel-collector'
    static_configs:
      - targets: ['otel-collector:9464']

We have the following configuration:

Scrape interval: the time interval between queries to the OpenTelemetry Collector.
Scrape configs: where we give a name to the task that will perform the search and the IP and port to which it will connect for said search.

Jaeger Docker image

For the Jaeger image I have directly taken an example available in our beloved GPT chat:

jaeger:
    image: jaegertracing/all-in-one:latest
    container_name: jaeger
    environment:
      - COLLECTOR_OTLP_ENABLED=true
    ports:
      - 16686:16686    # Jaeger UI
      - 14250:14250    # OTLP gRPC receiver

The most important point, apart from port 16689, which we'll use to access the web interface, is the COLLECTOR_OTLP_ENABLED environment variable, which enables Jaeger's default ports to allow HTTP connections from OpenTelemetry Collector. Here you can see the list of ports it uses, but I can tell you in advance that it's 4317 for gRCP transmission and 4318 for HTTP. As you've seen in the OpenTelemetry Collector configuration, we'll be using the gRCP connection.

Well, with everything ready, let's see it in operation by starting the project.

Emitting metrics, receiving in Prometheus

Now that we have our instances configured and running, let's access the web interface provided by Prometheus. In our example, we have mapped it to http://localhost:9090 . By default, it will show us where we can run queries on the available metrics.

If we have correctly configured the connection between IRIS - OpenTelemetry Collector - Prometheus, from the Prometheus query screen we will have access to all the standard IRIS metrics available as you can see in the following screenshot searching for "iris_"

If we select any of them we will be able to see a graph over time:

Sending traces to Jaeger

To check the operation of sending traces we are going to use the simplest resource that IRIS provides us, which is the TestTraces() method of the SYS.Monitor.OTel class that you can consult here . If you have any particular interest in seeing it in more detail, let me know in the comments and I will be happy to write an article about it.

We simply execute the command from the SYS namespace via console:

%SYS>do ##class(SYS.Monitor.OTel).TestTraces()

This will send us a trace that should have reached Jaeger, let's check it from its graphical interface displayed at http://localhost:1668

In the filters on the left menu we can see that we have a service called irisotel , this service is the one used by IRIS to test the sending of test traces, hence the name of the received trace test_trace .

Well, that's it! We've now got our instance ready to send all the metrics and trace data we need. I'm available if you'd like to dig deeper into the topic.

Go to the original post written by @Luis Angel Pérez Ramos