Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Introduction

The Agent Cluster is designed to provide horizontal scalability and fail-over capabilities for Agents in HA environments, see JS7 - Agent Cluster. It works without a single point of failure.

...

This article is focused on fail-over of Subagents. For fail-over scenarios with Director Agent Clusters see JS7 - How to test fail-over of Subagents in an Agent Cluster

For command line references see the JS7 - Agent - Command Line Operation article.

Fail-over Operation

Fail-over occurs when an Active Subagent is terminated abnormally. Fail-over includes that the task currently being executed by the Subagent is considered to have failed and that the related order is set to a failed state. An Inactive Subagent is no longer considered for execution of jobs by a Director Agent:

...

Fail-over happens within a short period of time, typically in 2-3s.

Round-robin Subagent Cluster

Anchor
round_robin_normal_operation
round_robin_normal_operation
Scenario for normal Cluster Operation

The JS7 - How to set up an Agent Cluster article explains how to set up a number of Subagents.

  1. Create a workflow from the Configuration view and assign the same Agent Cluster to all jobs. Once the configuration is completed deploy the workflow.



  2. The Agent Cluster is configured for round-robin scheduling and executes each subsequent job with the next Subagent.
  3. To test cluster behavior navigate to the Workflows view and select a workflow from the tree.



  4. Expand the workflow and add an order.



  5. Once the workflow has completed successfully open the log from the history panel.



  6. In the log, you can identify that all jobs use different Subagents as the Agent Cluster is set up for round-robin scheduling. Each next job is executed with the next Subagent.

Scenario for fail-over Cluster Operation

  1. Kill one of the Active Subagents from the command line to force fail-over with one of the below commands.
    • An Active Subagent is killed, for example:
      • on Unix with a SIGKILL signal corresponding to the command: kill -9
      • on Windows with the command: taskkill /F
    • From the command line, the Agent Instance Start Script can be used like this:
      • agent_<port>.sh | .cmd abort
      • agent_<porr>.sh | .cmd kill



  2. Check the order log to verify that jobs in the workflow are successfully executed with all the remaining Subagents.

Fixed-priority Subagent Cluster

Anchor
fixed_priority_normal_operation
fixed_priority_normal_operation
Scenario for normal Cluster Operation

This scenario is similar to the Scenario for normal Cluster Operation of a round-robin Subagent Cluster with the exception that jobs are assigned a Subagent Cluster which is set up for fixed-priority scheduling.

Fixed-priority means that all jobs will be executed with the first Subagent unless it becomes unavailable and only then jobs will be executed with the next Subagent.

Scenario for fail-over Cluster Operation

  1. Kill the Active Subagent from the command line to force fail-over with one of the commands listed below.
    • The Active Subagent is killed, for example:
      • on Unix with a SIGKILL signal corresponding to the command: kill -9
      • on Windows with the command: taskkill /F
    • From the command line the Agent Instance Start Script can be used like this:
      • agent_<port>.sh | .cmd abort
      • agent_<porr>.sh | .cmd kill



  2. Check the order log to verify that any jobs in the workflow are successfully executed with the next Subagent.

Further Resources