Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Current »

Disclaimer

Your use of this download is governed by Stonebranch’s Terms of Use.

Version Information

Template Name

Extension Name

Version

Status

System Monitor

ue-system-monitor

1 (Current 1.0.0)

Fixes and new Features are introduced.

Refer to Changelog for version history information.

Overview

The System Monitor integration provides a powerful tool for users to track system metrics, such as CPU usage, memory consumption, disk activity, and network performance from both Linux and Windows hosts. By leveraging OpenTelemetry, these metrics can be seamlessly published to observability platforms, enabling real-time infrastructure monitoring. This integration facilitates the detection of performance bottlenecks and potential system failures but also allows for proactive management of resources. By making infrastructure metrics observable, System Monitor enhances the ability to correlate system behavior with application performance, leading to better overall visibility and system reliability.

Key Features

Feature

Description

Observe and Publish Metrics

Observe and Publish Metrics from Linux and Windows Hosts. The following categories are supported.

  • CPU utilization metrics

  • CPU load Metrics

  • Memory utilization metrics

  • Paging/Swap space utilization metrics

  • Disk I/O metrics

  • Filesystem utilization metrics

  • Network Interface metrics

  • Process count metrics

  • Miscellaneous system metrics

Filtering

Filtering abilities for Disk/Filesystem and Network Interface metrics

Other Configuration Options

  • Individually switch on/off specific metric categories

  • Configurable metrics collection interval

Requirements

This integration requires a Universal Agent and a Python runtime to execute the Universal Task.

Area

Details

Python Version

Requires Python 3.11

Universal Agent Compatibility

  • Compatible with Universal Agent for Windows x64 and version >= 7.6.0.0.

  • Compatible with Universal Agent for Linux and version >= 7.6.0.0.

Universal Controller Compatibility

Universal Controller Version >= 7.6.0.0.

Open Telemetry

Universal Agent should be configured to send Open Telemetry data.

There should never be two task instances running simultaneously on the same system, as this can lead to inconsistent metric values and unreliable data. Although a warning appears on the default Grafana dashboard provided, our software does not automatically prevent multiple task instances from running simultaneously on the same system, so this must be managed operationally.

The provided Grafana dashboard makes use of metric attributes that are attached by Universal Agent using the Agent default configuration. If any of these options are changed, such as otel_uip_service_name, which can be configured inside of the uags.conf file, appropriate changes must be made to the queries used on the dashboard.

Input Fields

Name

Type

Description

Version Information

Action

Choice

Possible values are

  • System Monitor

Introduced in 1.0.0

Provide Configuration As

Choice

Specifies how System Monitor configuration is provided.

Available options are:

  • As YAML Text (default)

  • As YAML UAC Script

Available if Action is “System Monitor”

Introduced in 1.0.0

Collection Interval (sec)

Int

How often metrics are retrieved. Default value is 15 seconds.

Note

The Collection Interval determines the collection frequency of metrics and therefore how often metrics are sent to the OTEL Collector. This data can be pulled (scraped) by the intended Timeseries database (e.g. Prometheus) at configurable intervals. To optimize resource utilization and ensure granular metrics retrieval, it is recommended to align these values.

Introduced in 1.0.0

Configuration

Large Text

System Monitor configuration as Text

Default value:

metrics: 
	system: 
		cpu: 	# Enable this Metric. 
		memory: 
		load_average: 
		paging: 
		disk: 
		filesystem: 
		network: 
		processes:

For more information on the System Monitor configuration options, see YAML Configuration Options

Introduced in 1.0.0

Configuration

Script

System Monitor configuration as UC Script. This allows the configuration to be shared across multiple task definitions. For more information on the System Monitor configuration options, see YAML Configuration Options

Introduced in 1.0.0

Supported Actions

Action: System Monitor

Configuration examples

Provide configuration as YAML text. Collection interval is set to 15 seconds, and the default YAML configuration is used, activating all metrics and applying no filters.

Provide configuration using the "System Monitor - Full Configuration" UAC Script, setting a collection interval of 10 seconds.

System Monitor Configuration Options

The configuration, provided as either plain text or a UC Script, defines the System Monitor's behavior, specifying which metrics to be published and any desired filtering options. Written in YAML format, configurations must adhere to a defined hierarchical structure.

The metrics and system settings must always be present in configuration files. The activation or filtering of any other metrics is optional.

The configuration allows you to:

  1. Enable or disable metric categories: Choose which system metrics categories to collect.

  2. Filter specific resources: Apply include/exclude filters on specific attribute values using strict mode or regex. If any "include" filters are activated for a specific attribute, no "exclude" filters can be activated for the same attribute (they are mutually exclusive).

A configuration example that demonstrates all the applicable options is the following

Example YAML configuration file
metrics:
  system: 
    cpu:                            # Enables CPU metrics to be published
    memory:                         # Enables Memory metrics to be published
    load_average:                   # Enables Load Average metrics to be published
    paging:                         # Enables Swap/Paging metrics to be published
    disk:                           # Enables Disk metrics to be published
      exclude_devices:              # Filters exported metrics, excluding a list of devices. "include_devices" is also available.
        devices: ["loop.*"]         # Devices excluded using a list provided in brackets
        match_type: regex           # Value matching is with Regex
    filesystem:                     # Enables File System metrics to be published
      include_devices:              # Filters exported metrics, including only a list of devices. "exclude_devices" is also available.
        devices: ["sda", "sdb"]     # Devices included using a list provided in brackets
        match_type: strict          # Value matching is strict. The exact names from the above list are used.
      include_types:                # Filters exported metrics, including only a list of filesystem types. "exclude_types" is also available.
        types: ["xfs"]              # Filesystem types included using a list provided in brackets
        match_type: strict          # Value matching is strict. The exact names from the above list are used 
      include_mountpoints:          # Filters exported metrics, including only a list of mountpoints. "exclude_mountpoints" is also available
        mountpoints: ["/dev"]       # Mountpoints included using a list provided in brackets
        match_type: strict          # Value matching is strict. The exact names from the above list are used
    network:                        # Enables Network metrics to be published
      include_devices:              # Filters exported metrics, including only a list of network interfaces. "exclude_devices" is also available.
        devices: ["lo"]
        match_type: strict
	processes:


YAML Field

Description

metrics

Necessary as top-level key of the YAML configuration.

Required

system

Enables monitor of host uptime and acts as the root key for any additional metrics provided.

All following metrics (such as cpu) are marked for activation with the inclusion of the relevant key in the configuration file.

Required

cpu

Enables CPU related metrics.

load_average

Enables Load Average (1, 5 and 15 minute) metrics.

memory

Enables Memory metrics

paging

Enables Paging/Swap metrics.

disk

Enables Disk metrics. Filtering options are available:

  • include_devices: Includes only specific devices.

  • exclude_devices: Excludes specific devices.

The above filtering options are mutually exclusive (both should not be set)

filesystem

Enables Filesystems metrics. Filtering options are available:

  • include_devices: Includes only specific devices

  • exclude_devices: Excludes specific devices.

The above filtering options are mutually exclusive (both should not be set)

  • include_types: Includes only specific filesystem types.

  • exclude_types: Excludes specific filesystem types.

The above filtering options are mutually exclusive (both should not be set)

  • include_mountpoints: Includes only specific mountpoints

  • exclude_mountpoints: Excludes specific mountpoints.

The above filtering options are mutually exclusive (both should not be set)

If filtering options for devices/types and mountpoints are used at the same time, a logical AND is applied.

network

Enables Network metrics. Filtering options are available:

  • include_devices: Includes only specific network interfaces

  • exclude_devices: Excludes specific network interfaces

The above filtering options are mutually exclusive (both should not be set)

processes

Enables Process count metric.

System Monitor Configuration Examples

#

Configuration

Description

1

Default Configuration
metrics: 
	system: 
		cpu: 
		memory: 
		load_average: 
		paging: 
		disk: 
		filesystem: 
		network: 
		processes:

Default configuration that enables all available metrics without applying any filters to the configurations.


2


Configuration Excluding specific Disks
metrics:   
	system:     
		cpu:     
		memory:     
		load_average:     
		paging:     
		disk:       
			exclude_devices: 
				devices: ["sda", "sdb"]       
				match_type: strict     
		filesystem:     
		network:     
		processes:

This configuration filters disk metrics so as not to report for the disks named 'sda' and 'sdb'.


3

Configuration Excluding a set of Filesystems
metrics: 
	system: 
		cpu: 
		memory: 
		load_average: 
		paging: 
		disk: 
		filesystem: 
			exclude_devices: 
				devices: ["dev/loop.*"] 
				match_type: regex 
		network: 
		processes:

This configuration filters disk metrics so as to exclude reporting for any filesystems whose names start with ‘dev/loop’.

4


Configuration Including unsafe-only specific Network Interfaces
metrics:
	system:
		cpu:
		memory:
		load_average:
		paging:
		disk:
		filesystem:
		network:
			include_devices: 
				devices: ["Ethernet", "Wireless"]
				match_type: strict
		processes:

This configuration filters network metrics to include reports originating only from the ‘Ethernet' and 'Wireless’ network interfaces.

5

Configuration Including numerous filters
metrics: 
	system: 
		cpu: 
		memory: 
		load_average: 
		paging: 
		disk: 
			exclude_devices: 
				devices: ["loop.*"] 
				match_type: regex 
		filesystem: 
			exclude_devices: 
				devices: ["/dev/loop.*"] 
				match_type: regex 
			include_types: 
				types: ["xfs"] 
				match_type: strict 
			exclude_mountpoints: 
				mountpoints: ["/var/lib/snapd/.*"] 
				match_type: strict 
		network: 
			exclude_devices: 
				devices: ["lo"] 
				match_type: strict
		processes:

This configuration applies several filters to tailor the collected metrics as follows:

  • Disk Filters: Exclude disks with names starting with loop.

  • Filesystem Filters:

    • Exclude filesystems with names starting with /dev/loop.

    • Include only filesystems of type xfs.

    • Exclude filesystems with mountpoints starting with /var/lib/snapd/.

  • Network Filters: exclude the 'lo' network interface.

Action Output

Output Type

Description

Examples

EXTENSION

The extension output provides the following information:

  • exit_code, status_description: General info regarding the task execution. For more information, see the exit code table.

  • invocation: The task configuration used for this task execution.

  • result: Any errors that have been raised in case of Failure.
Successful scenario
{
	"exit_code": 0,
	"status_description": "Task cancelled successfully",
	"invocation": {
		"extension": "ue-system-monitor",
		"version": "1.0.0",
		"fields": { ... }
	}
}
Failing scenario
{
	"exit_code": 20,
	"status_description": "Data Validation Error: Duplicate key detected in configuration file",
	"invocation": {
		"extension": "ue-system-monitor",
		"version": "1.0.0",
		"fields": { ... }
	},
	"result": {
		"errors": [
			"Data Validation Error: Duplicate key detected in configuration file"
		]
	}
}

STDERR

Universal Extension Task log information


Exit Codes

Exit Code

Status

Status Description

 Meaning

0

Success

“Success: << Task cancelled successfully.>>“

Successful execution and subsequent cancellation.

1

Failure

“Execution Failed: <<Error Description>>”

Raised in case of an unexpected error during execution

20

Failure

“Data Validation Error: <<Error Description>>“

Validation error related to input fields or the YAML Configuration provided.

* See STDERR for more detailed error descriptions.

Observability

System CPU metrics

Metric: system.cpu.time

Name

Instrument Type

Unit (UCUM)

Attributes

Description

system.cpu.time

Counter

s

As defined on Metric Attributes List

Observes the CPU time spent on the system.

Metric Attributes List:

Attribute Name

Description

state

The CPU mode on which time was spent. Possible values are:

  • user: time spent by normal processes executing in user mode; on Linux this also includes guest time

  • system: time spent by processes executing in kernel mode

  • idle: time spent doing nothing

Platform-specific fields:

  • nice (Linux): time spent by niced (prioritized) processes executing in user mode; on Linux this also includes guest_nice time

  • iowait (Linux): time spent waiting for I/O to complete. This is not accounted in idle time counter.

  • irq (Linux): time spent for servicing hardware interrupts

  • softirq (Linux): time spent for servicing software interrupts

  • steal (Linux 2.6.11+): time spent by other operating systems running in a virtualized environment

  • guest (Linux 2.6.24+): time spent running a virtual CPU for guest operating systems under the control of the Linux kernel

  • guest_nice (Linux 3.2.0+): time spent running a niced guest (virtual CPU for guest operating systems under the control of the Linux kernel)

  • interrupt (Windows): time spent for servicing hardware interrupts (similar to “irq” on UNIX)

  • dpc (Windows): time spent servicing deferred procedure calls (DPCs); DPCs are interrupts that run at a lower priority than standard interrupts.

Note: Not all attributes might be available as this relates to the platform and version operating system version.

cpu

The logical CPU number [cpu0, cpu1, ..cpun-1]

Metric: system.cpu.utilization

Name

Instrument Type

Unit (UCUM)

Attributes

Description

system.cpu.utilization

Gauge

1

As defined on Metric Attributes List

Observes the CPU utilization on the system.

Metric Attributes List:

Attribute Name

Description

state

The CPU mode on which time was spent. Possible values are:

  • user: time spent by normal processes executing in user mode; on Linux this also includes guest time

  • system: time spent by processes executing in kernel mode

  • idle: time spent doing nothing

Platform-specific fields:

  • nice (Linux): time spent by niced (prioritized) processes executing in user mode; on Linux this also includes guest_nice time

  • iowait (Linux): time spent waiting for I/O to complete. This is not accounted in idle time counter.

  • irq (Linux): time spent for servicing hardware interrupts

  • softirq (Linux): time spent for servicing software interrupts

  • steal (Linux 2.6.11+): time spent by other operating systems running in a virtualized environment

  • guest (Linux 2.6.24+): time spent running a virtual CPU for guest operating systems under the control of the Linux kernel

  • guest_nice (Linux 3.2.0+): time spent running a niced guest (virtual CPU for guest operating systems under the control of the Linux kernel)

  • interrupt (Windows): time spent for servicing hardware interrupts (similar to “irq” on UNIX)

  • dpc (Windows): time spent servicing deferred procedure calls (DPCs); DPCs are interrupts that run at a lower priority than standard interrupts.

Note: Not all attributes might be available as this relates to the platform and version operating system version.

cpu

The logical CPU number [cpu0, cpu1, ..cpun-1]

Metric: system.cpu.physical.count

Name

Instrument Type

Unit (UCUM)

Attributes

Description

system.cpu.physical.count

UpDown Counter

{cpu}

-

Reports the number of actual physical processor cores on the hardware

Metric: system.cpu.logical.count

Name

Instrument Type

Unit (UCUM)

Attributes

Description

system.cpu.logical.count

UpDown Counter

{cpu}

-

Reports the number of logical (virtual) processor cores created by the operating system to manage multitasking

System Load Average Metrics

Metric: system.linux.cpu.load_1m

Name

Instrument Type

Unit (UCUM)

Attributes

Description

system.linux.cpu.load_1m

Gauge

1

-

Return the approximate average system load over the last 1 minute. The “load” represents the processes which are in a runnable state, either using the CPU or waiting to use the CPU (e.g. waiting for disk I/O)

Metric: system.linux.cpu.load_5m

Name

Instrument Type

Unit (UCUM)

Attributes

Description

system.linux.cpu.load_5m

Gauge

1

-

Return the approximate average system load over the last 5 minutes. The “load” represents the processes which are in a runnable state, either using the CPU or waiting to use the CPU (e.g. waiting for disk I/O)

Metric: system.linux.cpu.load_15m

Name

Instrument Type

Unit (UCUM)

Attributes

Description

system.linux.cpu.load_15m

Gauge

1

-

Return the approximate average system load over the last 15 minutes. The “load” represents the processes which are in a runnable state, either using the CPU or waiting to use the CPU (e.g. waiting for disk I/O)

Metric: system.windows.cpu.load_1m

Name

Instrument Type

Unit (UCUM)

Attributes

Description

system.windows.cpu.load_1m

Gauge

1

-

Return the approximate average system load over the last 1 minute. The “load” represents the processes which are in a runnable state, either using the CPU or waiting to use the CPU (e.g. waiting for disk I/O). On Windows this information is not directly retrieved by a system function, and it is based on emulation.

Metric: system.windows.cpu.load_5m

Name

Instrument Type

Unit (UCUM)

Attributes

Description

system.windows.cpu.load_5m

Gauge

1

-

Return the approximate average system load over the last 5 minutes. The “load” represents the processes which are in a runnable state, either using the CPU or waiting to use the CPU (e.g. waiting for disk I/O). On Windows this information is not directly retrieved by a system function and it is based on emulation.

Metric: system.windows.cpu.load_15m

Name

Instrument Type

Unit (UCUM)

Attributes

Description

system.windows.cpu.load_15m

Gauge

1

-

Return the approximate average system load over the last 15 minutes. The “load” represents the processes which are in a runnable state, either using the CPU or waiting to use the CPU (e.g. waiting for disk I/O). On Windows this information is not directly retrieved by a system function and it is based on emulation.

System Memory Metrics

Metric: system.memory.usage

Name

Instrument Type

Unit (UCUM)

Attributes

Description

system.memory.usage

UpDown Counter

By

As defined on Metric Attributes List

Tracks the system memory usage

Metric Attributes List:

Attribute Name

Description

state

The reflected memory state. The values of this attribute are platform dependent. Possible values are:

Linux

  • buffers

  • cached

  • free

  • used

Windows

  • free

  • used

Metric: system.memory.utilization

Name

Instrument Type

Unit (UCUM)

Attributes

Description

system.memory.utilization

Gauge

1

As defined on Metric Attributes List

Tracks the system memory usage utilization

Metric Attributes List:

Attribute Name

Description

state

The reflected memory state. The values of this attribute are platform dependent. Possible values are:

Linux

  • buffers

  • cached

  • free

  • used

Windows

  • free

  • used

Metric: system.memory.limit

Name

Instrument Type

Unit (UCUM)

Attributes

Description

system.memory.limit

UpDown Counter

By

-

Represents the total physical available memory on the system (exclusive swap)

Metric: system.memory.shared

Name

Instrument Type

Unit (UCUM)

Attributes

Description

system.memory.shared

UpDown Counter

By

-

System memory shared (Applies only to Linux systems)

Metric: system.linux.memory.available

Name

Instrument Type

Unit (UCUM)

Attributes

Description

system.linux.memory.available

UpDown Counter

By

-

An estimate of how much memory is available for starting new applications, without causing swapping. This is an alternative to the system.memory.usage metric with state free. Linux starting from 3.14 exports “available” memory. It takes “free” memory as a baseline and then factors in kernel-specific values. This is meant to be more accurate than just “free” memory.

System Paging/Swap Metrics


Metric: system.paging.usage

Name

Instrument Type

Unit (UCUM)

Attributes

Description

system.paging.usage

UpDown Counter

By

As defined on Metric Attributes List

Represents Linux swap or windows pagefile usage

Metric Attributes List:

Attribute Name

Description

state

The reflected paging state. Possible values are:

  • used

  • free

Metric: system.paging.utilization

Name

Instrument Type

Unit (UCUM)

Attributes

Description

system.paging.utilization

Gauge

1

As defined on Metric Attributes List

Represents Linux swap or windows pagefile usage utilization

Metric Attributes List:

Attribute Name

Description

state

The reflected paging state. Possible values are:

  • used

  • free

Metric: system.paging.limit

Name

Instrument Type

Unit (UCUM)

Attributes

Description

system.paging.limit

UpDown Counter

By

-

Represents total Linux swap or windows pagefile size

System Disk I/O Metrics


On Windows diskperf -y command may need to be executed to enable disk performance counters.

Metric: system.disk.io

Name

Instrument Type

Unit (UCUM)

Attributes

Description

system.disk.io

Counter

By

As defined on Metric Attributes List

Represents the disk I/O activity in bytes.

Metric Attributes List:

Attribute Name

Description

direction

The disk IO operation direction. Possible values are:

  • read

  • write

device

The device identifier (for example “C:” or “sda”)

Metric: system.disk.operations

Name

Instrument Type

Unit (UCUM)

Attributes

Description

system.disk.operations

Counter

{operation}

As defined on Metric Attributes List

Represents the disk I/O operations (Read counts and Write counts)

Metric Attributes List:

Attribute Name

Description

direction

The disk IO operation direction. Possible values are:

  • read

  • write

device

The device identifier (for example “C:” or “sda”)

System Filesystem Metrics


Metric: system.filesystem.usage

Name

Instrument Type

Unit (UCUM)

Attributes

Description

system.filesystem.usage

UpDownCounter

By

As defined on Metric Attributes List

Reports a filesystem’s space usage.

Metrics Attributes List:

Attribute Name

Description

device

The device identifier for where the filesystem resides (for example “C:” or “/dev/sda”)

mode

The primary filesystem mode. Possible values are:

  • ro

  • rw

mountpoint

The filesystem mount path (for example “C:\” or “/”)

state

The filesystem state. Possible values are:

  • free

  • reserved

  • used

  • total

type

The filesystem type (for example “NTFS”)

Metric: system.filesystem.utilization

Name

Instrument Type

Unit (UCUM)

Attributes

Description

system.filesystem.utilization

Gauge

1

As defined on Metric Attributes List

Reports a filesystem’s space utilization percentage.

Metric Attributes List:

Attribute Name

Description

device

The device identifier for where the filesystem resides (for example “C:” or “/dev/sda”)

mode

The primary filesystem mode. Possible values are:

  • ro

  • rw

mountpoint

The filesystem mount path (for example “C:\” or “/”)

type

The filesystem type (for example “NTFS”)

Metric: system.filesystem.limit

Name

Instrument Type

Unit (UCUM)

Attributes

Description

system.filesystem.usage

UpDownCounter

By

As defined on Metric Attributes List

Reports the total storage capacity of the filesystem.

Metric Attributes List:

Attribute Name

Description

device

The device identifier for where the filesystem resides (for example “C:” or “/dev/sda”)

mode

The primary filesystem mode. Possible values are:

  • ro

  • rw

mountpoint

The filesystem mount path (for example “C:\” or “/”)

type

The filesystem type (for example “NTFS”)

System Network Metrics


Metric: system.network.dropped

Name

Instrument Type

Unit (UCUM)

Attributes

Description

system.network.dropped

Counter

{packet}

As defined on Metric Attributes List

Count of packets that are dropped or discarded even though there was no error

Metric Attributes List:

Attribute Name

Description

direction

The network IO operation direction. Possible values are:

  • receive

  • transmit

device

The identifier of the network interface (for example “Ethernet” or “bridge0”)

Metric: system.network.packets

Name

Instrument Type

Unit (UCUM)

Attributes

Description

system.network.packets

Counter

{packet}

As defined on Metric Attributes List

Count of packets sent or received.

Metric Attributes List:

Attribute Name

Description

direction

The network IO operation direction. Possible values are:

  • receive

  • transmit

device

The identifier of the network interface (for example “Ethernet” or “bridge0”)

Metric: system.network.errors

Name

Instrument Type

Unit (UCUM)

Attributes

Description

system.network.errors

Counter

{error}

As defined on Metric Attributes List

Count of network errors detected

Metric Attribute List:

Attribute Name

Description

direction

The network IO operation direction. Possible values are:

  • receive

  • transmit

device

The identifier of the network interface (for example “Ethernet” or “bridge0”)

Metric: system.network.io

Name

Instrument Type

Unit (UCUM)

Attributes

Description

system.network.io

Counter

By

As defined on Metric Attributes List

Reports the total number of bytes sent or received via the selected network interface

Metric Attribute List:

Attribute Name

Description

direction

The network IO operation direction. Possible values are:

  • receive

  • transmit

device

The identifier of the network interface (for example “Ethernet” or “bridge0”)

Metric: system.network.connections

Name

Instrument Type

Unit (UCUM)

Attributes

Description

system.network.connections

UpDownCounter

{connection}

As defined on Metric Attributes List

Reports the total number of active connections for each network interface

Metric Attribute List:

Attribute Name

Description

protocol

OSI transport layer or inter-process communication method. Possible values are:

  • tcp

  • udp

  • unix

state

The state of the connection. Only concerns connections with the “tcp” protocol

System Aggregate Processes Metrics


Metric: system.process.count

Name

Instrument Type

Unit (UCUM)

Attributes

Description

system.process.count

UpDownCounter

{process}

As defined on Metric Attributes

Reports the total number of processes in each state

Metric Attribute List:

Attribute Name

Description

status

The current state of the process within its lifecycle. Possible values are:

  • running

  • sleeping

  • stopped

Platform specific values for Linux:

  • zombie

  • idle

  • dead

  • wakekill

  • waking

Platform specific values for Windows:

  • disk_sleep

  • waiting

  • transition


Grafana Dashboard

All data collected by System Monitor can be automatically exported to the appropriate database, such as Prometheus. Users can then leverage the provided Grafana Dashboard to monitor and visualize those metrics in real-time, by simply importing the dashboard into their Grafana instance. Once imported, the dashboard allows users to monitor system performance, detect sudden changes in metrics, analyze historical trends or recent activity by adjusting the time frame, and filter data to focus on specific components. Applying filters to the YAML configuration or disabling certain metrics from the extension may lead to incomplete data or missing panels on the default dashboard. While users have the flexibility to fully customize visualizations and queries to suit their needs, it's important to ensure that customizations align with the integration's configuration.

The provided dashboard is built using default Prometheus metric naming conventions, with suffixes enabled. If a different database is used, or if Prometheus suffixes are disabled, some query metric names may differ, resulting in discrepancies in the displayed data. Adjustments to the dashboard queries may be required to align with the new naming conventions.

By default, the dashboard organizes metrics into three distinct categories:

  • At a Glance: Provides a high-level overview of critical metrics for quick assessment.

  • Basic Information Over Time: Displays fundamental trends and patterns for essential metrics.

  • Advanced Information Over Time: Offers in-depth analysis and detailed trends for advanced users.

Screenshots showcasing the layout and functionality of the dashboard are included below.

How To

Import Universal Template

To use the Universal Template, you first must perform the following steps.

  1. This Universal Task requires the Resolvable Credentials feature. Check that the Resolvable Credentials Permitted system property has been set to true.

  2. To import the Universal Template into your Controller, follow these instructions.

  3. When the files have been imported successfully, refresh the Universal Templates list; the Universal Template will appear on the list.

Modifications of this integration, applied by users or customers, before or after import, might affect the supportability of this integration. For more information refer to System Monitor

Configure Universal Task

For a new Universal Task, create a new task, and enter the required input fields.

Import Grafana Dashboard

Users can benefit from a ready-to-use sample dashboard that this downloadable integration offers. It is located under /observability/grafana/ directory inside the downloadable zip file from Stonebranch Integration Hub. Administrators should refer to the official Grafana User Guide on how to import a Grafana Dashboard.

Dashboard’s Prometheus data source is configured as a variable, and thus needs to be mapped to an existing Data Source configured on the target Grafana instance.

Integration Modifications

Modifications applied by users or customers, before or after import, might affect the supportability of this integration. The following modifications are discouraged to retain the support level as applied for this integration.

  • Python code modifications should not be done.

  • Template Modifications

    • General Section

      • "Name", "Extension", "Variable Prefix", and "Icon" should not be changed.

    • Universal Template Details Section

      • "Template Type", "Agent Type", "Send Extension Variables", and "Always Cancel on Force Finish" should not be changed.

    • Result Processing Defaults Section

      • Success and Failure Exit codes should not be changed.

      • Success and Failure Output processing should not be changed.

    • Fields Restriction Section
      The setup of the template does not impose any restrictions. However, concerning the "Exit Code Processing Fields" section.

      1. Success/Failure exit codes need to be respected.

      2. In principle, as STDERR and STDOUT outputs can change in follow-up releases of this integration, they should not be considered as a reliable source for determining the success or failure of a task.

Users and customers are encouraged to report defects, or feature requests at Stonebranch Support Desk.

Document References

This document references the following documents:

Document Link

Description

Universal Templates

User documentation for creating, working with and understanding Universal Templates and Integrations.

Universal Tasks

User documentation for creating Universal Tasks in the Universal Controller user interface.

Observability Start-up GuideUser documentation for configuring Universal Agent to send Open Telemetry data.

Changelog

ue-system-monitor-1.0.0 (2024-12-19)

Initial Version











  • No labels