Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Introduction

Using a JS7 - Agent Cluster provides high availability and is a feature that is subject to the JS7 - License.

  • Fail-over is an automated operation that occurs when a Subagent is aborted or killed. Fail-over is applied in case of abnormal termination.
  • Switch-over is a manual operation performed by users disabling/enabling Subagents.

For command - line references see the JS7 - Agent - Command Line Operation article.

Test Case for the Agent Cluster

In the Article JS7 - How to set up an Agent Cluster we have set up the Agent cluster with multiple subagents.

...

.

...

Fail-over Operation

Fail-over occurs when an Active Subagent is terminated abnormally. Fail-over means that the task currently being executed by the Subagent is considered to have failed and that the related order is set to a failed state. An Inactive Subagent is no longer considered for execution of jobs by a Director Agent:

  • Subagent Clusters configured for round-robin scheduling will execute jobs with the remaining Subagents. 
  • Subagent Clusters configured for fixed-priority scheduling will switch execution of jobs to the next Subagent.

Fail-over can be invoked by the following actions:

  • The Active Subagent is killed, for example:
    • for Unix with a SIGKILL signal corresponding to the command: kill -9
    • for Windows with the command: taskkill /F
  • From the command line the user performs one of the operationsAgent's Instance Start Script can be used like this:
    • agent_<port>.sh | .cmd abort
    • agent_<port>.sh | .cmd kill

Fail-over will not occur when:

  • the Active Subagent is stopped normally from the command line:
    • agent_<port>.sh | .cmd stop
  • the operating system is shut down and systemd / init.d or a Windows Service are in place to stop the Subagent normally.

Fail-over happens within a short period of time, typically in 2-3s.

How to test fail-over in the Agent Cluster

Round-robin Subagent Cluster

Anchor
round_robin_normal_operation
round_robin_normal_operation
Scenario for normal Cluster Operation

The In the Article JS7 - How to set up an Agent Cluster we have set up the Agent cluster with multiple subagents article explains how to set up a number of Subagents.

  1. Create a workflow from the configuration Configuration view and assigned assign the same Agent Cluster to all the jobs. Once the configuration is completed deploy the workflow.

    Image Modified

  2. The Agent cluster Cluster is configured as a for round-robin scheduling and executes each next job on with the next Subagent.
  3. To test this switch cluster behavior navigate to the Workflows tab view and select the a workflow from the folder tree structure.

    Image Modified
  4. Expand the workflow and add an order.
    Image Removed

    Image Added

  5. Once the workflow completed successfully open the log from the history panel.

    Image Added

  6. In the log, you can identify that all jobs use different Subagents as the Agent Cluster is set up for round-robin scheduling. Each next job is executed with the next Subagent.

Scenario for fail-over Cluster Operation

  1. Kill one of the Active Subagents from the command line to force fail-over with one of the below commands.
    • An Active Subagent is killed, for example:
      • for Unix with a SIGKILL signal corresponding to the command: kill -9
      • for Windows with the command: taskkill /F
    • From the command line the Agent Instance Start Script can be used like this:
      • agent_<port>.sh | .cmd abort
      • agent_<porr>.sh | .cmd kill

        Image Added

  2. Check the order log to verify that jobs in the workflow are successfully executed with all remaining Subagents.

Fixed-priority Subagent Cluster

Anchor
fixed_priority_normal_operation
fixed_priority_normal_operation
Scenario for normal Cluster Operation

The scenario is similar to the Scenario for normal Cluster Operation of a round-robin Subagent Cluster with the exception that jobs are assigned a Subagent Cluster that is set up for fixed-priority scheduling.

Fixed-priority means that all jobs will be executed with the first Subagent unless it becomes unavailable and only then jobs will be executed with the next Subagent.

Scenario for fail-over Cluster Operation

  1. Kill the Active Subagent from the command line to force fail-over with one of the below commandsKill any of the SubAgent from the windows command line or from the Linux terminal to execute the fail-over test case with the below command.
    • The Active Subagent is killed, for example:
      • for Unix with a SIGKILL signal corresponding to the command: kill -9
      • for Windows with the command: taskkill /F
    • From the command line the user performs one of the operationsAgent Instance Start Script can be used like this:
      • agent_<port>.sh | .cmd abort
      • agent_<porr>.sh | .cmd kill

        Image Modified

  2. Check the order log to verify that any jobs in the workflow are successfully executed with the next Subagent.