Manager Fault Tolerance - Universal Command - Functionality

Manager Fault Tolerance Functionality

The basic functionality of Manager fault tolerance is:

  1. Manager requests the execution of a command on a remote system.
  2. Command executes on the remote system, optionally reading and writing data.
  3. Manager redirects:
    • Its standard input data to the standard input of the remote command.
    • Standard output file and standard error file of the remote command to its own standard output and standard error.

If the Manager is terminated or the Manager's host system is shut down, the remote command cannot read the Manager's standard input or write its standard output and error files. Without Manager fault tolerance, the remote command must terminate, since its data source and destination are now gone. Otherwise, it would wait forever.

Manager fault tolerance provides an execution environment in which the Manager is not required in order for the user process to continue execution on the remote system. The user process can execute to completion with or without a Manager connected.

When the Manager starts a user process, the Manager executes as normal; standard output and standard error files are redirected back to the Manager as the user process produces the data. The difference is data spooling. In order for the user process to have real-time access to its input and output, the data is spooled in the Universal Spool Database. The spool provides complete independence from the Manager. The spool subsystem satisfies all data requirements for the user process via the Universal Command Server.

The Manager can terminate and a new Manager can restart and reconnect to the user process. If the user process has completed, the new Manager receives the user processes standard files and its exit status. The restarted Manager behaves in all ways as if it was the originating Manager .

Command Identifier

A Manager requests Manager fault tolerance with the MANAGER_FAULT_TOLERANT configuration option and by providing a command identifier (command ID) using the COMMAND_ID configuration option. The command ID identifies the unit of work being executed. In this context, a unit of work includes the Manager, Server, and user process.

The Manager indicates to the Server that this request is restartable. The COMMAND_ID configuration option provides a command identifier that uniquely identifies the Server and user process on the remote host. When a Manager is restarted, it must provide the same command ID identifying the Server and user process with which it wants to reconnect.

Providing a unique command ID is not trivial. Many Managers may be executing on many different hosts, and all executing work on the same Server host. It is possible for a Manager to start a restartable command from one host, terminate, and restart on a completely different host.

The command ID value can be any text value of unrestricted length. In practical terms, the character set and limits on command line length of the Manager host impose restrictions on the value.

Standard I/O Files

The Universal Spool system satisfies all user process data requests via the Universal Command Server. When the user process reads from its standard input file, the Server reads it from the spool and provides it to the user process. When the user process writes to standard output or error, the Server receives the data and writes it to the spool.

A Manager requesting restart capability (Manager fault tolerance) first transfers its entire standard input file to the Server, which it in turns writes to the spool. When all data has been received, the Server creates the user process. This provides complete Manager independence for the entire life of the user process.

As long as the Manager is connected, the standard output and standard error files are transferred to the Manager, as the user process produces the data, all in real-time. The data also is written to the spool. If the Manager terminates, the data is written to the spool only.

A restarted Manager is sent all of the standard output and standard error files, from the beginning, that currently is spooled. If the user process still is executing, the restarted Manager will receive all of the data currently spooled. When it has caught up with the data being produced, the Manager starts to receive the data from the user process as it is written.

Requesting Restart

When a restartable Manager is initiated, it is either an initial instance or a restarted instance of a command ID. The command ID identifies a unit of work represented by the Manager, Server, and user process. See #Command Identifier, above, for more information on the command ID.

The RESTART configuration option specifies whether or not the Manager instance is requesting a restart of a previous command ID. Possible RESTART values are yes, no, and auto.

The auto value specifies:

  • If there is no existing command ID executing on the remote host, consider this Manager execution the first instance.
  • If there is an existing command ID, and it is not connected to any Manager, consider this a restart of the command ID.

The auto value permits automatic restart by eliminating the need to modify the RESTART value for the initial instance and restarted instance.

Note

The auto value cannot be used with a COMMAND_ID configuration option value of *, which specifies that the UCMD Manager will generate a unique command ID for each run.

Case Example 1 - Normal Execution

The following figure diagrams the sequence of events that occur when a restartable Manager requests the execution of a command on a remote host. In this case, the Manager and Server remain executing and connected until normal completion of the user process.
 


 

The Local Host is the host on which the Manager is being executed.

The Remote Host is the host on which the Manager is requesting command execution.

Components

The components involved are:

  • Universal Command Manager
    The Manager requests remote execution of a command or script. The Manager executes the remote command in a manner such that the command appears to be executed locally.
  • Universal Broker
    The Broker manages Universal Agent component execution.
  • Universal Command Server
    The Server executes the Manager requested command and processes the user process's standard I/O requests.
  • User Process
    The user process represents the Manager requested command.

Sequence of Events

The diagram demonstrates the sequence of events that occur when a restartable Manager requests command execution on a remote host. The numbers enclosed in circles represent the sequence of events and correspond to the listed descriptions below.

Step 1

The Manager connects to the Broker and sends a request to start a Server. The start request from the Manager requests Manager fault tolerance and includes the command ID to identify the unit of work.

Step 2

The Broker records the unit of work in the Broker Component Database as restartable for possible future restarts.

Step 3

The Broker starts an instance of the Server.

Step 4

The Manager and Server exchange messages that specify all options used to carry out the request.

Step 5

The Server records the unit of work in the Universal Command Server Database for possible future restarts.

Step 6

The Manager sends all standard input data to the Server, and the Server writes the standard input data to the Universal Spool database.

Step 7

Once all standard input is spooled, the Server starts the user process.

Step 8

As the user process writes standard output and standard error data, the Server writes the data to the Universal Spool database. If the Manager is connected to the Server, the data is written to the Manager as well.

Step 9

The user process executes until completion. Once the user process completes, the Server writes the exit status of the user process to the Universal Command Server Database.

Step 10

The Server sends the exit status to the Manager. This completes the unit of work.

Case Example 2 - Restart when User Process is Executing

The following figure diagrams the sequence of events that occur when a Manager requests a restart of a currently executing unit of work. In this case the initial instance of the Manager terminated. A restarted instance of the Manager is started and requests to be reconnected to the unit of work.

This example continues from #Case Example 1 - Normal Execution; please refer to that example for details of the component descriptions included in the following diagram.


Sequence of Events

The diagram demonstrates the sequence of events that occur when a Manager requests to be restarted with a unit of work identified by a command ID. The numbers enclosed in circles represent the sequence of events and correspond to the listed descriptions below.

Step 1

The restarted instance of the Manager sends a restart request to the Broker. The restart request contains the command ID specified as part of the invocation of the Manager.

Step 2

The Broker verifies that the component is restartable and that the components communication state is acceptable for a restart request. If the Server component were currently connected to a Manager, its communication state would not permit a restart request.

Step 3

The Broker sends the restart request to the Server corresponding to the command ID.

Step 4

The Server authenticates the request with the Manager-supplied user ID and password. The password must be the same as the initial Manager instance.

Step 5

The Manager and Server exchange options that are used to carry out the request.

Step 6

The Server records the restart in the Universal Command Server Database.

Step 7

The Server sends spooled standard output and error files to the Manager. This is performed while the user process may still be writing standard output and error to the spool. Once all spooled output is sent to the Manager, the Server will send standard output and error from the user process as it is being produced.

Step 8

The user process executes to completion. The Server records the user process exit status in the Universal Command Server Database.

Step 9

The Server sends the exit status to the Manager. This completes the unit of work.

Case Example 3 - Restart when User Process has Ended

The following figure diagrams the sequence of events that occur when a Manager requests a restart of a unit of work that has completed. In this case, the initial instance of the Manager has terminated, the user process completed normally, and a restarted instance of the Manager is started and requests to be reconnected to the completed unit of work.

This example continues from #Case Example 1 - Normal Execution; please refer to that example for details of the component descriptions included in the following diagram.
 

Sequence of Events

The diagram demonstrates the sequence of events that occur when a Manager requests to be restarted with a unit of work identified by a command ID. The user process in this case has completed execution. The numbers enclosed in circles represent the sequence of events and correspond to the following descriptions:

Step 1

The restarted instance of the Manager sends a restart request to the Broker. The restart request contains the command ID specified as part of the invocation of the Manager.

Step 2

The Broker verifies that the component is restartable and that the components communication state is acceptable for a restart request. If the Server component were currently connected to a Manager, its communication state would not permit a restart request.

Step 3

Since the user process has completed, the Broker starts a new Server to process the restart request. The Server authenticates the request with the Manager-supplied user ID and password. The password must be the same as the initial Manager instance.

Step 4

The Manager and Server exchange options that are used to carry out the request.

Step 5

The Server records the restart in the Universal Command Server Database.

Step 6

The Server sends spooled standard output and error files to the Manager.

Step 7

The Server sends the user process exit status to the Manager. This completes the unit of work.