Table of Contents | ||
---|---|---|
|
Disclaimer
Your use of this download is governed by Stonebranch’s Terms of Use.
Version Information
Template Name | Extension Name | Version | Status |
---|---|---|---|
System Monitor | ue-system-monitor | 1 (Current 1.0.0) | Fixes and new Features are introduced. |
Refer to Changelog for version history information.
Overview
The System Monitor integration provides a powerful tool for users to track system metrics, such as CPU usage, memory consumption, disk activity, and network performance from both Linux and Windows hosts. By leveraging OpenTelemetry, these metrics can be seamlessly published to observability platforms, enabling real-time infrastructure monitoring. This integration facilitates the detection of performance bottlenecks and potential system failures but also allows for proactive management of resources. By making infrastructure metrics observable, System Monitor enhances the ability to correlate system behavior with application performance, leading to better overall visibility and system reliability.
Key Features
Feature | Description |
---|---|
Observe and Publish Metrics | Observe and Publish Metrics from Linux and Windows Hosts. The following categories are supported.
|
Filtering | Filtering abilities for Disk/Filesystem and Network Interface metrics |
Other Configuration Options |
|
Requirements
This integration requires a Universal Agent and a Python runtime to execute the Universal Task.
...
Note |
---|
There should never be two task instances running simultaneously on the same system, as this can lead to inconsistent metric values and unreliable data. Although a warning appears on the default Grafana dashboard provided, our software does not automatically prevent multiple task instances from running simultaneously on the same system, so this must be managed operationally. |
Note |
---|
The provided Grafana dashboard makes use of metric attributes that are attached by Universal Agent using the Agent default configuration. If any of these options are changed, such as |
Input Fields
Name | Type | Description | Version Information | |||||
---|---|---|---|---|---|---|---|---|
Action | Choice | Possible values are
| Introduced in 1.0.0 | |||||
Provide Configuration As | Choice | Specifies how System Monitor configuration is provided. Available options are:
Available if Action is “System Monitor” | Introduced in 1.0.0 | |||||
Collection Interval (sec) | Int | How often metrics are retrieved. Default value is 15 seconds.
| Introduced in 1.0.0 | |||||
Configuration | Large Text | System Monitor configuration as Text Default value:
For more information on the System Monitor configuration options, see YAML Configuration Options | Introduced in 1.0.0 | |||||
Configuration | Script | System Monitor configuration as UC Script. This allows the configuration to be shared across multiple task definitions. For more information on the System Monitor configuration options, see YAML Configuration Options | Introduced in 1.0.0 |
Supported Actions
Action: System Monitor
Configuration examples
Provide configuration as YAML text. Collection interval is set to 15 seconds, and the default YAML configuration is used, activating all metrics and applying no filters. | Provide configuration using the "System Monitor - Full Configuration" UAC Script, setting a collection interval of 10 seconds. |
Anchor | ||||
---|---|---|---|---|
|
The configuration, provided as either plain text or a UC Script, defines the System Monitor's behavior, specifying which metrics to be published and any desired filtering options. Written in YAML format, configurations must adhere to a defined hierarchical structure.
...
YAML Field | Description |
---|---|
| Necessary as top-level key of the YAML configuration. Required |
| Enables monitor of host uptime and acts as the root key for any additional metrics provided. All following metrics (such as cpu) are marked for activation with the inclusion of the relevant key in the configuration file. Required |
| Enables CPU related metrics. |
| Enables Load Average (1, 5 and 15 minute) metrics. |
| Enables Memory metrics |
| Enables Paging/Swap metrics. |
| Enables Disk metrics. Filtering options are available:
The above filtering options are mutually exclusive (both should not be set) |
| Enables Filesystems metrics. Filtering options are available:
The above filtering options are mutually exclusive (both should not be set)
The above filtering options are mutually exclusive (both should not be set)
The above filtering options are mutually exclusive (both should not be set) If filtering options for devices/types and mountpoints are used at the same time, a logical AND is applied. |
| Enables Network metrics. Filtering options are available:
The above filtering options are mutually exclusive (both should not be set) |
| Enables Process count metric. |
System Monitor Configuration Examples
# | Configuration | Description | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
1 |
| Default configuration that enables all available metrics without applying any filters to the configurations. | |||||||||
2 |
| This configuration filters disk metrics so as not to report for the disks named 'sda' and 'sdb'. | |||||||||
3 |
| This configuration filters disk metrics so as to exclude reporting for any filesystems whose names start with ‘dev/loop’. | |||||||||
4 |
| This configuration filters network metrics to include reports originating only from the ‘Ethernet' and 'Wireless’ network interfaces. | |||||||||
5 |
| This configuration applies several filters to tailor the collected metrics as follows:
|
Action Output
Output Type | Description | Examples | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
EXTENSION | The extension output provides the following information:
|
| ||||||||||||||||||
STDERR | Universal Extension Task log information |
Anchor Exit Codes Exit Codes
Exit Codes
Exit Codes | |
Exit Codes |
Exit Code | Status | Status Description | Meaning |
---|---|---|---|
0 | Success | “Success: << Task cancelled successfully.>>“ | Successful execution and subsequent cancellation. |
1 | Failure | “Execution Failed: <<Error Description>>” | Raised in case of an unexpected error during execution |
20 | Failure | “Data Validation Error: <<Error Description>>“ | Validation error related to input fields or the YAML Configuration provided. * See STDERR for more detailed error descriptions. |
Observability
System CPU metrics
Metric: system.cpu.time
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Counter | s | As defined on Metric Attributes List | Observes the CPU time spent on the system. |
...
Attribute Name | Description |
---|---|
| The CPU mode on which time was spent. Possible values are:
Platform-specific fields:
Note: Not all attributes might be available as this relates to the platform and version operating system version. |
| The logical CPU number |
Metric: system.cpu.utilization
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Gauge | 1 | As defined on Metric Attributes List | Observes the CPU utilization on the system. |
...
Attribute Name | Description |
---|---|
| The CPU mode on which time was spent. Possible values are:
Platform-specific fields:
Note: Not all attributes might be available as this relates to the platform and version operating system version. |
| The logical CPU number |
Metric: system.cpu.physical.count
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| UpDown Counter | {cpu} | - | Reports the number of actual physical processor cores on the hardware |
Metric: system.cpu.logical.count
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| UpDown Counter | {cpu} | - | Reports the number of logical (virtual) processor cores created by the operating system to manage multitasking |
System Load Average Metrics
Metric: system.linux.cpu.load_1m
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Gauge | 1 | - | Return the approximate average system load over the last 1 minute. The “load” represents the processes which are in a runnable state, either using the CPU or waiting to use the CPU (e.g. waiting for disk I/O) |
Metric: system.linux.cpu.load_5m
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Gauge | 1 | - | Return the approximate average system load over the last 5 minutes. The “load” represents the processes which are in a runnable state, either using the CPU or waiting to use the CPU (e.g. waiting for disk I/O) |
Metric: system.linux.cpu.load_15m
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Gauge | 1 | - | Return the approximate average system load over the last 15 minutes. The “load” represents the processes which are in a runnable state, either using the CPU or waiting to use the CPU (e.g. waiting for disk I/O) |
Metric: system.windows.cpu.load_1m
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Gauge | 1 | - | Return the approximate average system load over the last 1 minute. The “load” represents the processes which are in a runnable state, either using the CPU or waiting to use the CPU (e.g. waiting for disk I/O). On Windows this information is not directly retrieved by a system function, and it is based on emulation. |
Metric: system.windows.cpu.load_5m
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Gauge | 1 | - | Return the approximate average system load over the last 5 minutes. The “load” represents the processes which are in a runnable state, either using the CPU or waiting to use the CPU (e.g. waiting for disk I/O). On Windows this information is not directly retrieved by a system function and it is based on emulation. |
Metric: system.windows.cpu.load_15m
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Gauge | 1 | - | Return the approximate average system load over the last 15 minutes. The “load” represents the processes which are in a runnable state, either using the CPU or waiting to use the CPU (e.g. waiting for disk I/O). On Windows this information is not directly retrieved by a system function and it is based on emulation. |
System Memory Metrics
Metric: system.memory.usage
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| UpDown Counter | By | As defined on Metric Attributes List | Tracks the system memory usage |
...
Attribute Name | Description | |||
---|---|---|---|---|
| The reflected memory state. The values of this attribute are platform dependent. Possible values are: Linux
Windows
|
Metric: system.memory.utilization
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Gauge | 1 | As defined on Metric Attributes List | Tracks the system memory usage utilization |
...
Attribute Name | Description | |||
---|---|---|---|---|
| The reflected memory state. The values of this attribute are platform dependent. Possible values are: Linux
Windows
|
Metric: system.memory.limit
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| UpDown Counter | By | - | Represents the total physical available memory on the system (exclusive swap) |
Metric: system.memory.shared
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| UpDown Counter | By | - | System memory shared (Applies only to Linux systems) |
Metric: system.linux.memory.available
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| UpDown Counter | By | - | An estimate of how much memory is available for starting new applications, without causing swapping. This is an alternative to the |
System Paging/Swap Metrics
Metric: system.paging.usage
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| UpDown Counter | By | As defined on Metric Attributes List | Represents Linux swap or windows pagefile usage |
...
Attribute Name | Description | |||
---|---|---|---|---|
| The reflected paging state. Possible values are:
|
Metric: system.paging.utilization
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Gauge | 1 | As defined on Metric Attributes List | Represents Linux swap or windows pagefile usage utilization |
...
Attribute Name | Description | |||
---|---|---|---|---|
| The reflected paging state. Possible values are:
|
Metric: system.paging.limit
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| UpDown Counter | By | - | Represents total Linux swap or windows pagefile size |
System Disk I/O Metrics
Note |
---|
On Windows |
Metric: system.disk.io
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
Counter | By | As defined on Metric Attributes List | Represents the disk I/O activity in bytes. |
...
Attribute Name | Description | |||
---|---|---|---|---|
| The disk IO operation direction. Possible values are:
| |||
| The device identifier (for example “C:” or “sda”) |
Metric: system.disk.operations
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Counter | {operation} | As defined on Metric Attributes List | Represents the disk I/O operations (Read counts and Write counts) |
...
Attribute Name | Description | |||
---|---|---|---|---|
| The disk IO operation direction. Possible values are:
| |||
| The device identifier (for example “C:” or “sda”) |
System Filesystem Metrics
Metric: system.filesystem.usage
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| UpDownCounter | By | As defined on Metric Attributes List | Reports a filesystem’s space usage. |
...
Attribute Name | Description |
---|---|
| The device identifier for where the filesystem resides (for example “C:” or “/dev/sda”) |
| The primary filesystem mode. Possible values are:
|
| The filesystem mount path (for example “C:\” or “/”) |
| The filesystem state. Possible values are:
|
| The filesystem type (for example “NTFS”) |
Metric: system.filesystem.utilization
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Gauge | 1 | As defined on Metric Attributes List | Reports a filesystem’s space utilization percentage. |
...
Attribute Name | Description |
---|---|
| The device identifier for where the filesystem resides (for example “C:” or “/dev/sda”) |
| The primary filesystem mode. Possible values are:
|
| The filesystem mount path (for example “C:\” or “/”) |
| The filesystem type (for example “NTFS”) |
Metric: system.filesystem.limit
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| UpDownCounter | By | As defined on Metric Attributes List | Reports the total storage capacity of the filesystem. |
...
Attribute Name | Description |
---|---|
| The device identifier for where the filesystem resides (for example “C:” or “/dev/sda”) |
| The primary filesystem mode. Possible values are:
|
| The filesystem mount path (for example “C:\” or “/”) |
| The filesystem type (for example “NTFS”) |
System Network Metrics
Metric: system.network.dropped
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Counter | {packet} | As defined on Metric Attributes List | Count of packets that are dropped or discarded even though there was no error |
...
Attribute Name | Description |
---|---|
| The network IO operation direction. Possible values are:
|
| The identifier of the network interface (for example “Ethernet” or “bridge0”) |
Metric: system.network.packets
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Counter | {packet} | As defined on Metric Attributes List | Count of packets sent or received. |
...
Attribute Name | Description |
---|---|
| The network IO operation direction. Possible values are:
|
| The identifier of the network interface (for example “Ethernet” or “bridge0”) |
Metric: system.network.errors
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| Counter | {error} | As defined on Metric Attributes List | Count of network errors detected |
...
Attribute Name | Description |
---|---|
| The network IO operation direction. Possible values are:
|
| The identifier of the network interface (for example “Ethernet” or “bridge0”) |
Metric: system.network.io
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
Counter | By | As defined on Metric Attributes List | Reports the total number of bytes sent or received via the selected network interface |
...
Attribute Name | Description |
---|---|
| The network IO operation direction. Possible values are:
|
| The identifier of the network interface (for example “Ethernet” or “bridge0”) |
Metric: system.network.connections
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| UpDownCounter | {connection} | As defined on Metric Attributes List | Reports the total number of active connections for each network interface |
...
Attribute Name | Description |
---|---|
| OSI transport layer or inter-process communication method. Possible values are:
|
| The state of the connection. Only concerns connections with the “tcp” protocol |
System Aggregate Processes Metrics
Metric: system.process.count
Name | Instrument Type | Unit (UCUM) | Attributes | Description |
---|---|---|---|---|
| UpDownCounter | {process} | As defined on Metric Attributes | Reports the total number of processes in each state |
...
Attribute Name | Description |
---|---|
| The current state of the process within its lifecycle. Possible values are:
Platform specific values for Linux:
Platform specific values for Windows:
|
Anchor | ||||
---|---|---|---|---|
|
All data collected by System Monitor can be automatically exported to the appropriate database, such as Prometheus. Users can then leverage the provided Grafana Dashboard to monitor and visualize those metrics in real-time, by simply importing the dashboard into their Grafana instance. Once imported, the dashboard allows users to monitor system performance, detect sudden changes in metrics, analyze historical trends or recent activity by adjusting the time frame, and filter data to focus on specific components. Applying filters to the YAML configuration or disabling certain metrics from the extension may lead to incomplete data or missing panels on the default dashboard. While users have the flexibility to fully customize visualizations and queries to suit their needs, it's important to ensure that customizations align with the integration's configuration.
...
Screenshots showcasing the layout and functionality of the dashboard are included below.
How To
Import Universal Template
To use the Universal Template, you first must perform the following steps.
...
Modifications of this integration, applied by users or customers, before or after import, might affect the supportability of this integration. For more information refer to System Monitor Integration Modifications.
Configure Universal Task
For a new Universal Task, create a new task, and enter the required input fields.
Anchor | ||||
---|---|---|---|---|
|
Users can benefit from a ready-to-use sample dashboard that this downloadable integration offers. It is located under /observability/grafana/ directory inside the downloadable zip file from Stonebranch Integration Hub. Administrators should refer to the official Grafana User Guide on how to import a Grafana Dashboard.
Info |
---|
Dashboard’s Prometheus data source is configured as a variable, and thus needs to be mapped to an existing Data Source configured on the target Grafana instance. |
Anchor | ||||
---|---|---|---|---|
|
Modifications applied by users or customers, before or after import, might affect the supportability of this integration. The following modifications are discouraged to retain the support level as applied for this integration.
...
Users and customers are encouraged to report defects, or feature requests at Stonebranch Support Desk.
Document References
This document references the following documents:
Document Link | Description |
---|---|
User documentation for creating, working with and understanding Universal Templates and Integrations. | |
User documentation for creating Universal Tasks in the Universal Controller user interface. | |
Observability Start-up Guide | User documentation for configuring Universal Agent to send Open Telemetry data. |
Changelog
ue-system-monitor-1.0.0 (2024-12-19)
Initial Version