Introduction
What is Observability?
In the ever-evolving landscape of distributed system operations, ensuring the reliability, performance, and scalability of complex applications has become increasingly more difficult. System Observability has emerged as a critical practice that empowers IT organizations to effectively monitor and gain deep insights into the inner workings of their software systems. By systematically collecting and analyzing data about applications, infrastructure, and user interactions, observability enables teams to proactively identify, diagnose, and resolve issues, ultimately leading to enhanced user experiences and operational efficiency.
What is OpenTelemetry?
OpenTelemetry is an open-source project that standardizes the collection of telemetry data from software systems, making it easier for organizations to gain holistic visibility into their environments. By seamlessly integrating with various programming languages, frameworks, and cloud platforms, OpenTelemetry simplifies the instrumentation of applications, allowing developers and operators to collect rich, actionable data about their systems' behavior. The adoption of OpenTelemetry by software vendors and Application Performance Monitoring (APM) tools represents a significant shift in the observability landscape. OpenTelemetry has gained substantial traction across the industry due to its open-source, vendor-neutral approach and its ability to standardize telemetry data collection.
Many software vendors have started incorporating OpenTelemetry into their frameworks and libraries. Major cloud service providers like AWS, Azure, and Google Cloud have also embraced OpenTelemetry. In addition, many APM tools have integrated OpenTelemetry into their offerings. This integration allows users of these APM solutions to easily collect and visualize telemetry data from their applications instrumented with OpenTelemetry. It enhances the compatibility and flexibility of APM tools, making them more versatile in heterogeneous technology stacks.
Solution Architecture (Component Description)
Key Features (Controller, OMS, Agent, Extensions)
How to Get Started
Introduction
The following will provide a minimal setup to get started with Observability for Universal Automation Center.
The set-up is based on widely used Open Source tools.
The set-up is not intended for production use. To use the here provided set-up in a production environment, further configurations with regard to security have to be applied.
The set-up allows collecting Metrics and Trace data from Universal Automation Center. The collected Metrics data is stored in Prometheus for analysis in Grafana.
The collected Trace data is stored in Elasticsearch for analysis in Jaeger. The Jaeger UI is embed in the Universal Controller.
Jaeger, Prometheus and Grafana are selected for this Get Started Guide as examples. Any other data store or analysis tool could also be used.
Metrics
Metrics data can be collected from Universal Controller, Universal Agent, OMS and Universal Tasks of type Extension.
Metrics data is pulled through the Prometheus metrics Web Service endpoint (Metrics API) and via user-defined Universal Event Open Telemetry metrics, which is exported to an Open Telemetry metrics collector (OTEL Collector).
The collected Metrics data exported to Prometheus for analysis in Grafana.
To enable Open Telemetry metrics, an Open Telemetry (OTEL) collector with a Prometheus exporter need to be configured.
Trace
Universal Controller will manually instrument Open Telemetry trace on Universal Controller (UC), OMS, Universal Agent (UA), and Universal Task Extension interactions associated with task instance executions, agent registration, and Universal Task of type Extension deployment.
The collected Trace data is stored in Elasticsearch for analysis in Jaeger.
To enable tracing an Open Telemetry span exporter must be configured.
Prerequisites
The sample set will done on a single on-premise Linux server.
Server Requirements
- Linux Server
- Memory: 16GB RAM
- Storage: 70GB Net storage
- CPU: 4 CPU
- Distribution: Any major Linux distribution
- For the installation and configurations of the required Observability tools Administrative privileges are required
- Ports
The Following default ports will be used.
Application | Port |
---|---|
Prometheus | http: 9090 |
Grafana: | http:3000 |
Jaeger | http:16686 |
Elastic | http:9200 |
OTEL Collector | 4317 (grpc), 4318 (http) |
Pre-Installed Software Components
It is assumed that following components are installed and configured properly:
- Universal Agent 7.5.0.0 or higher
- Universal Controller 7.5.0.0 or higher
Please refer to the documentation for Installation and Applying Maintenance - Universal Controller 7.4.x - Stonebranch Documentation (atlassian.net)
and Universal Agent 7.4.x for UNIX Quick Start Guide - Universal Agent 7.4.x - Stonebranch Documentation (atlassian.net) for further information on how to install Universal Agent and Universal Controller.
Required Software for the Observability
The following Opensource Software needs to be installed and configured for use with Universal Automation Center.
Note: This Startup Guide has been tested with the provide Software Version in the table below.
Configuration
Open Source Setup
It is important to follow the installation in the here given order, because the Software components have dependencies between each other.
Example:
- Jaeger needs Elasticsearch to store the trace data.
- OTEL Collector needs Prometheus to store the metrics data.
- Grafana needs Prometheus as data source for displaying the dashboards
Set up Elasticsearch
Description:
Elasticsearch is a distributed, RESTful search and analytics engine designed for real-time search and data storage. It is used for log and event data analysis, full-text search, and more.
Installation Steps:
Official Documentation: Elasticsearch Installation Guide
Test the Installation:
Setup up Jaeger
Description:
.
Installation Steps:
Official Documentation: Jaeger Installation Guide
Test the Installation:
.
Setup OTEL Collector
Description:
.
Installation Steps:
Official Documentation: OpenTelemetry Collector Installation
Test the Installation:
.
Set up Prometheus
Description:
.
Installation Steps:
Official Documentation: Prometheus Installation Guide
Test the Installation:
.
Set up Grafana
Description:
.
Installation Steps:
Official Documentation: Grafana Installation Guide
Test the Installation:
.
Universal Controller
Description:
.
Installation Steps:
Official Documentation: link to uc.properties open telemetry properties.
Universal Agent
Description:
The following describes the steps to enable tracing and metrics for UAG and OMS Server.
The here described set-up use http protocol. In addition to supporting HTTP (default), HTTPS is also supported.
Refer to the documentation on how to Enable and Configure SSL/TLS for OMS Server and UAG:
OMS Server : OMS - Development - Stonebranch Documentation (atlassian.net)
UAG: UAG - Development - Stonebranch Documentation (atlassian.net)
Installation Steps:
Enabling Metrics/Traces
Metrics and Traces will be turned off, by default, in both UAG and OMS Server. The user must configure two new options to enable metrics and traces.
Metrics:
Component | Configuration File Option |
---|---|
UAG | otel_export_metrics YES |
OMS Server | otel_export_metrics YES |
Traces:
Component | Configuration File Option |
---|---|
UAG | otel_enable_tracing YES |
OMS Server | otel_enable_tracing YES |
Configure Service Name
All applications using Opentelemetry must register a service.name, including UAG and OMS Server
Component | Configuration File Option |
---|---|
UAG | otel_service_name <agent_name> |
OMS Server | otel_service_name <oms_agent_name> |
Configuring OTLP Endpoint
Both the metrics and tracing engines end up pushing the relevant data to the Opentelemetry collector using the HTTP(S) protocol (gRPC protocol
NOT supported this release). In most scenarios, the traces and metrics will be sent to the same collector, but this is not strictly necessary. To
account for this, two new options will be added in both UAG and OMS
Metrics:
Component | Configuration File Option |
---|---|
UAG | otel_metrics_endpoint http://localhost:4318 |
OMS Server | otel_metrics_endpoint http://localhost:4318 |
Traces:
Component | Configuration File Option |
---|---|
UAG | otel_trace_endpoint http://localhost:4318 |
OMS Server | otel_trace_endpoint http://localhost:4318 |
The following provides the sample set-up for UAG and OMS Server:
# /etc/universal/uags.conf: otel_export_metrics YES otel_enable_tracing YES otel_service_name agt_lx_wiesloch_uag otel_metrics_endpoint http://localhost:4318 otel_trace_endpoint http://localhost:4318
# /etc/universal/omss.conf: otel_export_metrics YES otel_enable_tracing YES otel_service_name agt_lx_wiesloch otel_metrics_endpoint http://localhost:4318 otel_trace_endpoint http://localhost:4318
Note: After addusting uags.conf and omss.conf restart the Universal Agent.
sudo /opt/universal/ubroker/ubrokerd restart
Official Documentation: Links to OMS and UAG open telemetry options.
OMS Server : OMS - Development - Stonebranch Documentation (atlassian.net)
UAG: UAG - Development - Stonebranch Documentation (atlassian.net)