High Availability
Introduction
High Availability (HA) of Universal Automation Center means that it has been set up to be a redundant system; in addition to the components that are processing work, there are back-up components available to continue processing through hardware or software failure.
This page describes a High Availability environment, how High Availability components recover in the event of such a failure, and what actions, if any, the user must take.
High Availability System
The following illustration is a typical, although simplified, Universal Automation Center system in a High Availability environment.
In this environment, there are:
Two Universal Controller instances (cluster nodes)
Two Universal Message Service (OMS) network communications providers in an OMS cluster
Four Universal Agent (Agent) machines
The components in blue are active and operating. The components in gray are available for operations but currently are inactive (passive).
See High Availability Components for a detailed description of how each component type functions in a High Availability environment.
High Availability Components
This section provides detailed information on the cluster nodes and Agents in a High Availability environment.
Cluster Nodes
Each Universal Automation Center installation consists of one or more instances of Universal Controller; each instance is a cluster node. Only one node is required in a Universal Automation Center system; however, in order to run a High Availability configuration, you must run at least two nodes.
At any given time under High Availability, one node operates in Active mode and the remaining nodes operate in Passive mode (see Determining Mode of a Cluster Node at Start-up).
An Active node performs all system processing functions; Passive nodes can perform limited processing functions.
Passive Cluster Node Restrictions
Passive cluster nodes cannot execute any automated or scheduled work.
Also, from a Passive node you cannot:
Perform a workflow instance insert task operation.
Perform a bulk import or list import.
Run the LDAP Refresh server operation.
Update a task instance.
Update or delete an enabled trigger.
Update an enabled Data Backup/Purge.
Update the Task Execution Limit field in Agent records.
Update the Task Execution Limit field and Distribution field in Agent Cluster records.
Update the user Time Zone.
List Composite Trigger component events.
However, Passive nodes do let you perform a limited number of processing functions, such as:
Launch tasks.
Monitor and display data.
Access the database.
Generate reports.
Agent
The Agent runs as a Windows service or Linux/Unix daemon. A cluster node sends a request to the Agent to perform a function. The Agent processes the request, gathers data about the operation of the client machine, and sends status and results back to the node. It performs these functions by exchanging messages with the node.
Once an Agent has registered with a node, you can view it by selecting that Agent type from the Agents & Connections navigation pane of the user interface. A list displays showing all the registered Agents of that type. See Agents Overview for more information.
If an Agent fails, Universal Broker restarts it. The Agent then attempts to determine what tasks or functions were in process at the time of failure.
Warm Start Processing is a term used to refer to a process UAG goes through upon startup by which all task instances that were active at the time of the last shutdown (intentional or otherwise) are reviewed and proper action is taken based on state and platform.
Task instances running on Windows and z/OS platforms are resumed when a Warm Start is attempted.
Task instances running on Unix and Linux platforms are set to IN-DOUBT status when a Warm Start is attempted.
In order to support such a determination, Agent task processing includes the following steps:
Step 1 | Each time the Agent receives a task, it writes to cache a record called |
|---|---|
Step 2 | As the task runs, the Agent updates the |
Step 3 | When the task run completes, the Agent deletes the |
Step 4 | If an Agent is restarted, it looks in the cache for |
As illustrated below, the Agent reads/writes a record to its agent/cache directory for each task instance that it manages.
Universal Message Service (OMS)
Universal Message Service (OMS) sends and receives messages between the cluster nodes and Agents.
OMS consists of an OMS Server and an OMS Administration Utility. The OMS clients - cluster nodes and Agents - establish persistent TCP/IP socket connections with the OMS Server.
OMS provides for reliable message communication by persisting all OMS queued messages to persistent storage. The OMS Server maintains OMS queues in an OMS message database that resides on persistent storage.
See Universal Message Service (OMS) for detailed information on OMS.
How High Availability Works
In a High Availability environment, passive cluster nodes play the role of standby servers to the active (primary) cluster nodes server. All running cluster nodes issue heartbeats and check the mode (status) of other running cluster nodes, both when they start up and continuously during operations. If a cluster node that currently is processing work can no longer do so, one of the other cluster nodes will take over and continue processing.