Introduction

The Agent Cluster is designed to provide horizontal scalability and fail-over capabilities for Agents in high availability environments, see JS7 - Agent Cluster. It works without a single point of failure.

Use of a JS7 - Agent Cluster is subject to the JS7 - License.

We find separate tiers in the architecture of Agent Clusters, see JS7 - System Architecture:

  • Controller (Cluster) → Director Agent (Cluster)
  • Director Agent (Cluster) → Subagent Cluster

We find separate layers for operation and use of Agent Clusters:

  • Operational Layer: Subagents and Director Agent Instances
    • Subagents and Director Agent instances are similarly installed.
    • Director Agent instances orchestrate Subagents. They include a Subagent that can be used if users wish to execute jobs from a Director Agent.
  • Functional Layer: Subagent Cluster and Director Agent Cluster
    • Jobs are assigned Subagent Clusters to specify that the jobs can be executed by any Subagent that is a member of the Subagent Cluster. The Subagent Cluster rules if a different Subagent will be chosen in case of fail-over only (fixed-priority scheduling, active-passive cluster) or for each next execution of a job (round-robin, active-active cluster).
    • The Director Agent Cluster is independent from Subagent Clusters. The purpose of clustering is to provide high availability for the role of orchestrating Subagents.

Consider the wording in this article:

  • Fail-over is an automated operation that occurs when a Director Agent instance is aborted or killed. Fail-over is applied in case of abnormal termination.
  • Switch-over is a manual operation performed by users on a Director Agent Cluster.

The article is focused on fail-over of a Director Agent. For fail-over scenarios with Subagent Clusters see JS7 - How to fail-over between Subagents in an Agent Cluster.

For command line references see the JS7 - Agent - Command Line Operation article.

Manage Director Agent Clusters

The JS7 - Agent Installation On Premises and JS7 - Agent Installation for Containers articles explain the installation procedure that is approx. the same for Director Agents and for Subagents. Director Agent instances require a license keys to be assigned, see JS7 - How to apply a JS7 License Key.

The icon in the JOC Cockpit main menu is used to navigate to the Manage Controllers/Agents view:


This brings forward the following view:

  • The view is grouped in Controllers.
  • For each Controller separate lists of Standalone Agents and Cluster Agents are displayed.


Add Director Agent Cluster

The Agent Cluster is situated in the operational layer and includes specification of Director Agents.

To add a Director Agent Cluster users can start from the action menu of the Controller:

This brings forward the following popup window:


Explanation:

For explanation of Input fields, see JS7 - Management of Agent Clusters.

Status of Agent Cluster

To check the Agent Cluster status users can navigate to the Resources->Agents view:

Operations on Director Agent Cluster

Fail-over

Fail-over occurs when an Active Director Agent instance is terminated abnormally. Fail-over includes that the task currently being executed by the Director Agent instance is considered to have failed and that the related order is set to a failed state. An Inactive Director Agent instance is no longer a member of the Director Agent Cluster:

  • The previous Standby Director Agent instance will take the active role.
  • Subagent Clusters will continue to execute jobs. They are not affected by a Director Agent's fail-over operation.
  • If the Agent Cluster is assigned to a File Order Source for JS7 - File Watching then the active Director Agent instance will pick up file watching. This is performed independently from the fact that the Subagent included with a Director Agent instance is enabled or disabled.

Fail-over can be caused by the following actions:

  • The Active Director Agent instance is killed, for example:
    • for Unix with a SIGKILL signal corresponding to the command: kill -9
    • for Windows with the command: taskkill /F
  • From the command line the Agent's Instance Start Script can be used like this:
    • agent_<port>.sh | .cmd abort
    • agent_<port>.sh | .cmd kill

Fail-over will not occur when:

  • the Active Director Agent instance is stopped normally from the command line:
    • agent_<port>.sh | .cmd stop
  • the operating system is shut down and systemd / init.d or a Windows Service are in place to stop the Director Agent instance normally.

Fail-over happens within a short period of time, typically in 2-3s.

Switch-Over

Switch-over is an operation that is caused by user intervention in JOC Cockpit or by use of the JS7 - REST Web Service API. The switch-over procedure does not require termination of an Active Director Agent, instead it shifts the active role to the standby Director Agent.

In the Resources->Agents view users can perform the switch-over operation from the Agent Cluster's action menu:

  • The active and standby Director Agent instances will switch roles.
  • As a prerequisite for switch-over
    • the Director Agent Cluster has to be coupled,
    • the Subagent in a Director Agent instance must not run jobs.
  • After switch-over the Standby Director Agent will become active and the the previously active Director Agent instance will be restarted.
  • If the Agent Cluster is assigned to a File Order Source for JS7 - File Watching then the active Director Agent will pick up file watching. 
  • This is performed independently from the fact that the Subagent included with a Director Agent instance is enabled or disabled.

Confirm loss of a Director Agent instance

The operation to Confirm loss of a Director Agent instance is performed in the following situation:

  • Assume that fail-over between Director Agent instances occurred. Assume that after fail-over both the Controller (Standalone Controller or Controller Cluster) and the remaining Director Agent instance are shutdown at the same point time. In this situation after restart of Controller and Director Agent the Controller cannot act as a witness to the previous Director Agent fail-over due to its own restart. As a result the Controller holding the role of the Cluster Watch cannot determine which of the newly started Director Agent instances should receive the active role as both Director Agent instances after restart will claim the active role.
  • In this situation the user is asked to decide which Director Agent should be considered lost. This includes to verify that the now standby Director Agent instance is shutdown at the point in time when the user takes this decision. Users can start the now standby Director Agent instance later on to re-establish the Director Agent Cluster.

Further Resources


  • No labels