Table of Contents | ||
---|---|---|
|
Disclaimer
Your use of this download is governed by Stonebranch’s Terms of Use.
Version Information
Template Name | Extension Name | Version | Status |
---|---|---|---|
System Monitor | ue-system-monitor | 1 (Current 1.0.0) | Fixes and new Features are introduced. |
Refer to Changelog for version history information.
Overview
The System Monitor integration provides a powerful tool for users to track system metrics, such as CPU usage, memory consumption, disk activity, and network performance from both Linux and Windows hosts. By leveraging OpenTelemetry, these metrics can be seamlessly published to observability platforms, enabling real-time infrastructure monitoring. This integration facilitates the detection of performance bottlenecks and potential system failures but also allows for proactive management of resources. By making infrastructure metrics observable, System Monitor enhances the ability to correlate system behavior with application performance, leading to better overall visibility and system reliability.
Key Features
Feature | Description |
---|---|
Observe and Publish Metrics | Observe and Publish Metrics from Linux and Windows Hosts. The following categories are supported.
|
Filtering | Filtering abilities for Disk/Filesystem and Network Interface metrics |
Other Configuration Options |
|
Requirements
This integration requires a Universal Agent and a Python runtime to execute the Universal Task.
Area | Details |
---|---|
Python Version | Requires Python 3.11 |
Universal Agent Compatibility |
|
Universal Controller Compatibility | Universal Controller Version >= 7.6.0.0. |
Open Telemetry | Universal Agent should be configured to send Open Telemetry data. |
Note |
---|
There should never be two task instances running simultaneously on the same system, as this can lead to inconsistent metric values and unreliable data. Although a warning appears on the default Grafana dashboard provided, our software does not automatically prevent multiple task instances from running simultaneously on the same system, so this must be managed operationally. |
Note |
---|
The provided Grafana dashboard makes use of metric attributes that are attached by Universal Agent using the Agent default configuration. If any of these options are changed, such as |
Input Fields
Name | Type | Description | Version Information | |||||
---|---|---|---|---|---|---|---|---|
Action | Choice | Possible values are
| Introduced in 1.0.0 | |||||
Provide Configuration As | Choice | Specifies how System Monitor configuration is provided. Available options are:
Available if Action is “System Monitor” | Introduced in 1.0.0 | |||||
Collection Interval (sec) | Int | How often metrics are retrieved. Default value is 15 seconds.
| Introduced in 1.0.0 | |||||
Configuration | Large Text | System Monitor configuration as Text Default value:
For more information on the System Monitor configuration options, see YAML Configuration Options | Introduced in 1.0.0 | |||||
Configuration | Script | System Monitor configuration as UC Script. This allows the configuration to be shared across multiple task definitions. For more information on the System Monitor configuration options, see YAML Configuration Options | Introduced in 1.0.0 |
Supported Actions
Action: System Monitor
Configuration examples
Provide configuration as YAML text. Collection interval is set to 15 seconds, and the default YAML configuration is used, activating all metrics and applying no filters. | Provide configuration using the "System Monitor - Full Configuration" UAC Script, setting a collection interval of 10 seconds. |
Anchor | ||||
---|---|---|---|---|
|
The configuration, provided as either plain text or a UC Script, defines the System Monitor's behavior, specifying which metrics to be published and any desired filtering options. Written in YAML format, configurations must adhere to a defined hierarchical structure.
The metrics
and system
settings must always be present in configuration files. The activation or filtering of any other metrics is optional.
The configuration allows you to:
Enable or disable metric categories: Choose which system metrics categories to collect.
Filter specific resources: Apply include/exclude filters on specific attribute values using strict mode or regex. If any "include" filters are activated for a specific attribute, no "exclude" filters can be activated for the same attribute (they are mutually exclusive).
A configuration example that demonstrates all the applicable options is the following
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
metrics:
system:
cpu: # Enables CPU metrics to be published
memory: # Enables Memory metrics to be published
load_average: # Enables Load Average metrics to be published
paging: # Enables Swap/Paging metrics to be published
disk: # Enables Disk metrics to be published
exclude_devices: # Filters exported metrics, excluding a list of devices. "include_devices" is also available.
devices: ["loop.*"] # Devices excluded using a list provided in brackets
match_type: regex # Value matching is with Regex
filesystem: # Enables File System metrics to be published
include_devices: # Filters exported metrics, including only a list of devices. "exclude_devices" is also available.
devices: ["sda", "sdb"] # Devices included using a list provided in brackets
match_type: strict # Value matching is strict. The exact names from the above list are used.
include_types: # Filters exported metrics, including only a list of filesystem types. "exclude_types" is also available.
types: ["xfs"] # Filesystem types included using a list provided in brackets
match_type: strict # Value matching is strict. The exact names from the above list are used
include_mountpoints: # Filters exported metrics, including only a list of mountpoints. "exclude_mountpoints" is also available
mountpoints: ["/dev"] # Mountpoints included using a list provided in brackets
match_type: strict # Value matching is strict. The exact names from the above list are used
network: # Enables Network metrics to be published
include_devices: # Filters exported metrics, including only a list of network interfaces. "exclude_devices" is also available.
devices: ["lo"]
match_type: strict
processes: |
YAML Field | Description |
---|---|
| Necessary as top-level key of the YAML configuration. Required |
| Enables monitor of host uptime and acts as the root key for any additional metrics provided. All following metrics (such as cpu) are marked for activation with the inclusion of the relevant key in the configuration file. Required |
| Enables CPU related metrics. |
| Enables Load Average (1, 5 and 15 minute) metrics. |
| Enables Memory metrics |
| Enables Paging/Swap metrics. |
| Enables Disk metrics. Filtering options are available:
The above filtering options are mutually exclusive (both should not be set) |
| Enables Filesystems metrics. Filtering options are available:
The above filtering options are mutually exclusive (both should not be set)
The above filtering options are mutually exclusive (both should not be set)
The above filtering options are mutually exclusive (both should not be set) If filtering options for devices/types and mountpoints are used at the same time, a logical AND is applied. |
| Enables Network metrics. Filtering options are available:
The above filtering options are mutually exclusive (both should not be set) |
| Enables Process count metric. |
System Monitor Configuration Examples
# | Configuration | Description | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
1 |
| Default configuration that enables all available metrics without applying any filters to the configurations. | |||||||||
2 |
| This configuration filters disk metrics so as not to report for the disks named 'sda' and 'sdb'. | |||||||||
3 |
| This configuration filters disk metrics so as to exclude reporting for any filesystems whose names start with ‘dev/loop’. | |||||||||
4 |
| This configuration filters network metrics to include reports originating only from the ‘Ethernet' and 'Wireless’ network interfaces. | |||||||||
5 |
| This configuration applies several filters to tailor the collected metrics as follows:
|
Action Output
Output Type | Description | Examples | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
EXTENSION | The extension output provides the following information:
|
| ||||||||||||||||||
STDERR | Universal Extension Task log information |
Anchor Exit Codes Exit Codes
Exit Codes
Exit Codes | |
Exit Codes |
Exit Code | Status | Status Description | Meaning |
---|---|---|---|
0 | Success | “Success: << Task cancelled successfully.>>“ | Successful execution and subsequent cancellation. |
1 | Failure | “Execution Failed: <<Error Description>>” | Raised in case of an unexpected error during execution |
20 | Failure | “Data Validation Error: <<Error Description>>“ | Validation error related to input fields or the YAML Configuration provided. * See STDERR for more detailed error descriptions. |
Observability
System CPU metrics
Metric: system.cpu.time
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Counter | s | As defined on Metric Attributes List | Observes the CPU time spent on the system. |
Metric Attributes List:
Attribute Name | Description |
---|---|
| The CPU mode on which time was spent. Possible values are:
Platform-specific fields:
Note: Not all attributes might be available as this relates to the platform and version operating system version. |
| The logical CPU number |
Metric: system.cpu.utilization
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Gauge | 1 | As defined on Metric Attributes List | Observes the CPU utilization on the system. |
Metric Attributes List:
Attribute Name | Description |
---|---|
| The CPU mode on which time was spent. Possible values are:
Platform-specific fields:
Note: Not all attributes might be available as this relates to the platform and version operating system version. |
| The logical CPU number |
Metric: system.cpu.physical.count
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| UpDown Counter | {cpu} | - | Reports the number of actual physical processor cores on the hardware |
Metric: system.cpu.logical.count
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| UpDown Counter | {cpu} | - | Reports the number of logical (virtual) processor cores created by the operating system to manage multitasking |
System Load Average Metrics
Metric: system.linux.cpu.load_1m
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Gauge | 1 | - | Return the approximate average system load over the last 1 minute. The “load” represents the processes which are in a runnable state, either using the CPU or waiting to use the CPU (e.g. waiting for disk I/O) |
Metric: system.linux.cpu.load_5m
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Gauge | 1 | - | Return the approximate average system load over the last 5 minutes. The “load” represents the processes which are in a runnable state, either using the CPU or waiting to use the CPU (e.g. waiting for disk I/O) |
Metric: system.linux.cpu.load_15m
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Gauge | 1 | - | Return the approximate average system load over the last 15 minutes. The “load” represents the processes which are in a runnable state, either using the CPU or waiting to use the CPU (e.g. waiting for disk I/O) |
Metric: system.windows.cpu.load_1m
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Gauge | 1 | - | Return the approximate average system load over the last 1 minute. The “load” represents the processes which are in a runnable state, either using the CPU or waiting to use the CPU (e.g. waiting for disk I/O). On Windows this information is not directly retrieved by a system function, and it is based on emulation. |
Metric: system.windows.cpu.load_5m
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Gauge | 1 | - | Return the approximate average system load over the last 5 minutes. The “load” represents the processes which are in a runnable state, either using the CPU or waiting to use the CPU (e.g. waiting for disk I/O). On Windows this information is not directly retrieved by a system function and it is based on emulation. |
Metric: system.windows.cpu.load_15m
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Gauge | 1 | - | Return the approximate average system load over the last 15 minutes. The “load” represents the processes which are in a runnable state, either using the CPU or waiting to use the CPU (e.g. waiting for disk I/O). On Windows this information is not directly retrieved by a system function and it is based on emulation. |
System Memory Metrics
Metric: system.memory.usage
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| UpDown Counter | By | As defined on Metric Attributes List | Tracks the system memory usage |
Metric Attributes List:
Attribute Name | Description | |||
---|---|---|---|---|
| The reflected memory state. The values of this attribute are platform dependent. Possible values are: Linux
Windows
|
Metric: system.memory.utilization
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Gauge | 1 | As defined on Metric Attributes List | Tracks the system memory usage utilization |
Metric Attributes List:
Attribute Name | Description | |||
---|---|---|---|---|
| The reflected memory state. The values of this attribute are platform dependent. Possible values are: Linux
Windows
|
Metric: system.memory.limit
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| UpDown Counter | By | - | Represents the total physical available memory on the system (exclusive swap) |
Metric: system.memory.shared
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| UpDown Counter | By | - | System memory shared (Applies only to Linux systems) |
Metric: system.linux.memory.available
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| UpDown Counter | By | - | An estimate of how much memory is available for starting new applications, without causing swapping. This is an alternative to the |
System Paging/Swap Metrics
Metric: system.paging.usage
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| UpDown Counter | By | As defined on Metric Attributes List | Represents Linux swap or windows pagefile usage |
Metric Attributes List:
Attribute Name | Description | |||
---|---|---|---|---|
| The reflected paging state. Possible values are:
|
Metric: system.paging.utilization
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Gauge | 1 | As defined on Metric Attributes List | Represents Linux swap or windows pagefile usage utilization |
Metric Attributes List:
Attribute Name | Description | |||
---|---|---|---|---|
| The reflected paging state. Possible values are:
|
Metric: system.paging.limit
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| UpDown Counter | By | - | Represents total Linux swap or windows pagefile size |
System Disk I/O Metrics
Note |
---|
On Windows |
Metric: system.disk.io
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
Counter | By | As defined on Metric Attributes List | Represents the disk I/O activity in bytes. |
Metric Attributes List:
Attribute Name | Description | |||
---|---|---|---|---|
| The disk IO operation direction. Possible values are:
| |||
| The device identifier (for example “C:” or “sda”) |
Metric: system.disk.operations
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Counter | {operation} | As defined on Metric Attributes List | Represents the disk I/O operations (Read counts and Write counts) |
Metric Attributes List:
Attribute Name | Description | |||
---|---|---|---|---|
| The disk IO operation direction. Possible values are:
| |||
| The device identifier (for example “C:” or “sda”) |
System Filesystem Metrics
Metric: system.filesystem.usage
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| UpDownCounter | By | As defined on Metric Attributes List | Reports a filesystem’s space usage. |
Metrics Attributes List:
Attribute Name | Description |
---|---|
| The device identifier for where the filesystem resides (for example “C:” or “/dev/sda”) |
| The primary filesystem mode. Possible values are:
|
| The filesystem mount path (for example “C:\” or “/”) |
| The filesystem state. Possible values are:
|
| The filesystem type (for example “NTFS”) |
Metric: system.filesystem.utilization
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Gauge | 1 | As defined on Metric Attributes List | Reports a filesystem’s space utilization percentage. |
Metric Attributes List:
Attribute Name | Description |
---|---|
| The device identifier for where the filesystem resides (for example “C:” or “/dev/sda”) |
| The primary filesystem mode. Possible values are:
|
| The filesystem mount path (for example “C:\” or “/”) |
| The filesystem type (for example “NTFS”) |
Metric: system.filesystem.limit
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| UpDownCounter | By | As defined on Metric Attributes List | Reports the total storage capacity of the filesystem. |
Metric Attributes List:
Attribute Name | Description |
---|---|
| The device identifier for where the filesystem resides (for example “C:” or “/dev/sda”) |
| The primary filesystem mode. Possible values are:
|
| The filesystem mount path (for example “C:\” or “/”) |
| The filesystem type (for example “NTFS”) |
System Network Metrics
Metric: system.network.dropped
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Counter | {packet} | As defined on Metric Attributes List | Count of packets that are dropped or discarded even though there was no error |
Metric Attributes List:
Attribute Name | Description |
---|---|
| The network IO operation direction. Possible values are:
|
| The identifier of the network interface (for example “Ethernet” or “bridge0”) |
Metric: system.network.packets
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Counter | {packet} | As defined on Metric Attributes List | Count of packets sent or received. |
Metric Attributes List:
Attribute Name | Description |
---|---|
| The network IO operation direction. Possible values are:
|
| The identifier of the network interface (for example “Ethernet” or “bridge0”) |
Metric: system.network.errors
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Counter | {error} | As defined on Metric Attributes List | Count of network errors detected |
Metric Attribute List:
Attribute Name | Description |
---|---|
| The network IO operation direction. Possible values are:
|
| The identifier of the network interface (for example “Ethernet” or “bridge0”) |
Metric: system.network.io
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
Counter | By | As defined on Metric Attributes List | Reports the total number of bytes sent or received via the selected network interface |
Metric Attribute List:
Attribute Name | Description |
---|---|
| The network IO operation direction. Possible values are:
|
| The identifier of the network interface (for example “Ethernet” or “bridge0”) |
Metric: system.network.connections
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| UpDownCounter | {connection} | As defined on Metric Attributes List | Reports the total number of active connections for each network interface |
Metric Attribute List:
Attribute Name | Description |
---|---|
| OSI transport layer or inter-process communication method. Possible values are:
|
| The state of the connection. Only concerns connections with the “tcp” protocol |
System Aggregate Processes Metrics
Metric: system.process.count
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| UpDownCounter | {process} | As defined on Metric Attributes | Reports the total number of processes in each state |
Metric Attribute List:
Attribute Name | Description |
---|---|
| The current state of the process within its lifecycle. Possible values are:
Platform specific values for Linux:
Platform specific values for Windows:
|
Anchor | ||||
---|---|---|---|---|
|
All data collected by System Monitor can be automatically exported to the appropriate database, such as Prometheus. Users can then leverage the provided Grafana Dashboard to monitor and visualize those metrics in real-time, by simply importing the dashboard into their Grafana instance. Once imported, the dashboard allows users to monitor system performance, detect sudden changes in metrics, analyze historical trends or recent activity by adjusting the time frame, and filter data to focus on specific components. Applying filters to the YAML configuration or disabling certain metrics from the extension may lead to incomplete data or missing panels on the default dashboard. While users have the flexibility to fully customize visualizations and queries to suit their needs, it's important to ensure that customizations align with the integration's configuration.
Info |
---|
The provided dashboard is built using default Prometheus metric naming conventions, with suffixes enabled. If a different database is used, or if Prometheus suffixes are disabled, some query metric names may differ, resulting in discrepancies in the displayed data. Adjustments to the dashboard queries may be required to align with the new naming conventions. |
By default, the dashboard organizes metrics into three distinct categories:
At a Glance: Provides a high-level overview of critical metrics for quick assessment.
Basic Information Over Time: Displays fundamental trends and patterns for essential metrics.
Advanced Information Over Time: Offers in-depth analysis and detailed trends for advanced users.
Screenshots showcasing the layout and functionality of the dashboard are included below.
How To
Import Universal Template
To use the Universal Template, you first must perform the following steps.
This Universal Task requires the Resolvable Credentials feature. Check that the Resolvable Credentials Permitted system property has been set to true.
To import the Universal Template into your Controller, follow these instructions.
When the files have been imported successfully, refresh the Universal Templates list; the Universal Template will appear on the list.
Modifications of this integration, applied by users or customers, before or after import, might affect the supportability of this integration. For more information refer to Integration Modifications.
Configure Universal Task
For a new Universal Task, create a new task, and enter the required input fields.
Anchor | ||||
---|---|---|---|---|
|
Users can benefit from a ready-to-use sample dashboard that this downloadable integration offers. It is located under /observability/grafana/ directory inside the downloadable zip file from Stonebranch Integration Hub. Administrators should refer to the official Grafana User Guide on how to import a Grafana Dashboard.
Info |
---|
Dashboard’s Prometheus data source is configured as a variable, and thus needs to be mapped to an existing Data Source configured on the target Grafana instance. |
Anchor | ||||
---|---|---|---|---|
|
Modifications applied by users or customers, before or after import, might affect the supportability of this integration. The following modifications are discouraged to retain the support level as applied for this integration.
Python code modifications should not be done.
Template Modifications
General Section
"Name", "Extension", "Variable Prefix", and "Icon" should not be changed.
Universal Template Details Section
"Template Type", "Agent Type", "Send Extension Variables", and "Always Cancel on Force Finish" should not be changed.
Result Processing Defaults Section
Success and Failure Exit codes should not be changed.
Success and Failure Output processing should not be changed.
Fields Restriction Section
The setup of the template does not impose any restrictions. However, concerning the "Exit Code Processing Fields" section.Success/Failure exit codes need to be respected.
In principle, as STDERR and STDOUT outputs can change in follow-up releases of this integration, they should not be considered as a reliable source for determining the success or failure of a task.
Users and customers are encouraged to report defects, or feature requests at Stonebranch Support Desk.
Document References
This document references the following documents:
Document Link | Description |
---|---|
User documentation for creating, working with and understanding Universal Templates and Integrations. | |
User documentation for creating Universal Tasks in the Universal Controller user interface. | |
Observability Start-up Guide | User documentation for configuring Universal Agent to send Open Telemetry data. |
Changelog
ue-system-monitor-1.0.0 (2024-12-19)
Initial Version