Manager Fault Tolerance - Universal Command - Component Management

Overview

In order to fully understand Universal Command fault tolerant features, some understanding of how the Universal Broker manages components is necessary.

Universal Broker manages component start-up, execution, and termination. The Broker and its components have the ability to communicate service requests and status information between each other.

The Broker maintains a database of components that are active or have completed and waiting for restart or reconnection. The component information maintained by the Broker determines the current state of the component. This state information is required by the Broker to determine if a restart or reconnect request from a Manager is acceptable or not. The Broker's component information can be viewed with the Universal Query utility.

One piece of component information maintained by the Broker is the component's communication state. The communication state primarily determines what state the Server is in regarding its network connection with a Manager and the completion of the user process and its associated spooled data.

Communication State Values

The following table describes the communication state values.

  • Reconnect column indicates whether or not a network reconnect request is valid.
  • Restart column indicates whether or not a restart request is valid.
     

State

Reconnect

Restart

Description

STARTED

NO

NO

Server has started.
 
If the Server is restartable, it is receiving the standard input file from the Manager and spooling it.

ESTABLISHED

NO

NO

Server and Manager are connected and processing normally. This is the most common state when all is well.

DISCONNECTED

YES

YES

Server is not connected to the Manager. This occurs when a network error has occurred, the Manager halted, or the Manager host halted.
 
The Server is either executing with the Network Fault Tolerant protocol, is restartable, or both.
 

Note

The Server cannot tell whether or not the Manager is still executing, since it cannot communicate with the Manager.

ORPHANED

NO

YES

Manager has terminated after sending a termination message to the Server to notify it of its termination.
 
This state only occurs if the Server is restartable.

RECONNECTING

NO

NO

Server has received a reconnect request from the Manager to recover a lost network connection.
 
This state should not remain long; only for the time it takes to re-establish the network connections.

RESTARTING

NO

NO

Server has received a restart request from the Manager.
 
This state should not remain long; only for the time it takes to re-establish network connections.

PENDING

NO

YES

A restartable Server and its user process have completed. The user process standard output and error files are in the spool.
 
A Manager has not been restarted to pick up the spooled files and user process exit status. The Server remains in this state until a Manager is restarted.

COMPLETED

NO

NO

Server and Manager have completed. All standard output and standard error files have been sent to the Manager and the user process's exit status.