OMS Server High Availability

Overview

Universal Message Service (OMS) is a communication middleware component providing network communications between the distributed Universal Automation Center components. OMS availability is a core requirement for a majority of the Universal Automation Center services. Universal Automation Center deployment requiring near 100% reliability and availability requires an OMS High Availability (HA) cluster deployment.

An OMS HA cluster consists of one active OMS Server and one or more inactive (passive) OMS Servers sharing a common OMS message database. OMS clients are configured with a list of OMS server addresses, one address for each OMS Server in the HA cluster. If the active OMS Server become unavailable, one of the inactive OMS Servers automatically assumes the role of the active OMS Server. OMS clients detect that the original active OMS Server is not available and automatically fail-over to the new active OMS Server. The shared OMS message database Ensures that no messages are lost when there is a change in the active OMS Server.

The following diagram illustrates a simple HA cluster configuration. It consists of two OMS Servers deployed on two different machines. OMS Server 1 is the active OMS Server and OMS Server 2 is the inactive OMS Server. Both OMS Servers share a common OMS message database. The OMS clients connect to the active OMS Server.
 

High Availability Configuration

Configuring an HA cluster consists of the following steps:
 

Step 1

Deploy the OMS message database on a shared file system that is available to all OMS Server cluster members.

Step 2

Configure the OMS Server cluster members to use the shared OMS message database.

Step 3

Configure the OMS clients with the list of OMS server cluster members.

Shared OMS Message Database

The OMS message database must reside on a shared file system accessible by each of the OMS Server cluster members. The shared OMS message database is utilized for the following capabilities:

  1. The active OMS server selection process.
  2. OMS message availability in fail-over scenarios.

The OMS Server cluster members determine the active OMS Server by obtaining an exclusive lock on a lock file in the OMS message database directory. The active OMS Server holds the file lock for the entire time it is executing. The inactive OMS Servers check every three seconds to see if the file lock is available. If the active OMS Server terminates, the exclusive file lock will be released, allowing one of the inactive OMS Servers to obtain exclusive access to the file lock. The OMS Server that obtains the file lock becomes the active OMS Server in the cluster.

The file system on which the OMS message database resides may be on a SAN or a network file system. The file system must support distributed file locks. On POSIX-based systems, such as UNIX and Linux, NFS version 4 or higher may be used. NFS version 3 does not support reliable file locks and must not be used. On Windows-based systems, SMB accessible file systems do provide support for file locks. Other network file systems are available. Check with the file system vendor to determine if POSIX compliant distributed file locks are supported.

The shared file system on which the OMS message database is located should be deployed as an HA configuration. If the shared file system becomes unavailable, the OMS HA cluster members will not have access to the OMS message database and will be rendered inoperable.

OMS Server Cluster

An OMS Server cluster consists of two or more OMS Servers sharing a common OMS message database. The OMS Servers should be installed on different machines in order to provide fault tolerance in the case of machine failure. The OMS message database contains platform-specific data types. Consequentially, all OMS Servers in the HA cluster must be installed on the same operating systems and hardware architectures - data size (32-bit or 64-bit) and encoding (little-endian or big-endian) - must be the same between all OMS Server cluster members.

Each OMS Server in the HA cluster must be configured to use the same, shared OMS message database. The OMS Server SPOOL_DIRECTORY OMS configuration option specifies the location of the OMS message database. Its value must be the same for all cluster members.

OMS Clients

OMS clients are configured with the address of the OMS Server used for network communications. OMS clients support using both a single, non-HA OMS Server as well as an OMS Server HA cluster. In the case of an HA cluster, the OMS clients support automatic fail-over between the OMS Server cluster members. OMS clients utilize an OMS HA cluster by configuring the OMS clients with an ordered, comma-separated list of OMS Server addresses. The OMS Servers specified in the address list must be members of the same HA cluster. If an OMS Server is specified that is not a member of the same OMS HA cluster, the results are unpredictable.

Below is an example OMS Server HA cluster address list.

oms1.acme.com,oms2.acme.com,oms3.acme.com

OMS clients will connect to the first OMS Server in the OMS Server address list, oms1.acme.com. If that OMS server is not available, the OMS client will attempt to connect to the next OMS Server in the list, oms2.acme.com, and so on, until it has successfully connected to an OMS Server. If none of the OMS Servers are available, the OMS client will wait for a period of time and try again to establish a connection with one of the OMS Servers. When an OMS client establishes a connection, it will utilize the OMS Server for network communications. If the connection fails for any reason, the OMS client will start the process of establishing a connection with the next OMS Server in the OMS Server list until it is successful.

See the documentation for a specific OMS client for details on how the OMS Server address list is specified for that OMS client.