Manager Fault Tolerance - Universal Command - Component Management
Overview
In order to fully understand Universal Command fault tolerant features, some understanding of how the Universal Broker manages components is necessary.
Universal Broker manages component start-up, execution, and termination. The Broker and its components have the ability to communicate service requests and status information between each other.
The Broker maintains a database of components that are active or have completed and waiting for restart or reconnection. The component information maintained by the Broker determines the current state of the component. This state information is required by the Broker to determine if a restart or reconnect request from a Manager is acceptable or not. The Broker's component information can be viewed with the Universal Query utility.
One piece of component information maintained by the Broker is the component's communication state. The communication state primarily determines what state the Server is in regarding its network connection with a Manager and the completion of the user process and its associated spooled data.
Communication State Values
The following table describes the communication state values.
- Reconnect column indicates whether or not a network reconnect request is valid.
- Restart column indicates whether or not a restart request is valid.
State |
Reconnect |
Restart |
Description |
---|---|---|---|
STARTED |
NO |
NO |
Server has started. |
ESTABLISHED |
NO |
NO |
Server and Manager are connected and processing normally. This is the most common state when all is well. |
DISCONNECTED |
YES |
YES |
Server is not connected to the Manager. This occurs when a network error has occurred, the Manager halted, or the Manager host halted. Note The Server cannot tell whether or not the Manager is still executing, since it cannot communicate with the Manager. |
ORPHANED |
NO |
YES |
Manager has terminated after sending a termination message to the Server to notify it of its termination. |
RECONNECTING |
NO |
NO |
Server has received a reconnect request from the Manager to recover a lost network connection. |
RESTARTING |
NO |
NO |
Server has received a restart request from the Manager. |
PENDING |
NO |
YES |
A restartable Server and its user process have completed. The user process standard output and error files are in the spool. |
COMPLETED |
NO |
NO |
Server and Manager have completed. All standard output and standard error files have been sent to the Manager and the user process's exit status. |