Skip to end of metadata
Go to start of metadata

Introduction

Use of a JS7 - Agent Cluster provides high availability and is a feature that is subject to the JS7 - License.

  • Fail-over is an automated operation that occurs when a Subagent is aborted or killed. Fail-over is applied in case of abnormal termination.
  • Switch-over is an operation that is caused by user intervention in the JOC Cockpit or by use of the JS7 - REST Web Service API. The switch-over procedure does not require termination of a Subagent, instead it shifts the active role to the remaining Subagent(s).
  • Switch-back similarly is caused by user intervention and includes making a Subagent available for further job execution.

Fail-over, switch-over and switch-back take place in a Subagent Cluster.

For command line references see the JS7 - Agent - Command Line Operation article.

Cluster Roles

Subagents can be members in any number of Subagent Clusters, which come as a functional layer on top of installed Subagents. Subagent Clusters are managed by a Director Agent which determines which Subagent will execute the next job.

  • Fixed-priority Scheduling
    • The first Subagent in an active-passive Subagent Cluster takes the active role for job execution, any further Subagents take a passive role.
    • If the Active Subagent becomes unavailable then the next Subagent in the cluster configuration is assigned the active role.
  • Round-robin Scheduling
    • Any Subagents in an active-active Subagent Cluster are assigned the active role.
    • The Director Agent assigns each next task to the next Subagent.
    • If a Subagent is disabled or becomes unavailable then the remaining Subagents will share execution of tasks.

Cluster Operations

Cluster operations include automated fail-over and a manual switch-over or switch-back of a Subagent.

For the following explanations assume an Agent Cluster with a number of Subagents like this:

Fail-over

Fail-over occurs when an Active Subagent is terminated abnormally. Fail-over means that the task currently being executed by the Subagent is considered to have failed and that the related order is set to a failed state. An Inactive Subagent is no longer considered for execution of jobs by a Director Agent.

Fail-over can be invoked by the following actions:

  • The Active Subagent is killed, for example:
    • for Unix with a SIGKILL signal corresponding to the command: kill -9
    • for Windows with the command: taskkill /F
  • From the command line the user performs one of the operations:
    • agent.sh | .cmd abort
    • agent.sh | .cmd kill

Fail-over will not occur when:

  • the Active Subagent is stopped normally from the command line:
    • agent.sh | .cmd stop
  • the operating system is shut down and systemd / init.d or a Windows Service are in place to stop the Subagent normally.

Fail-over happens within a short period of time, typically in 2-3s.

Switch-over

Switch-over occurs exclusively when invoked by user intervention. Switch-over means that a task currently running in an Active Subagent will be completed and that the Subagent will not be considered for further task execution.

Switch-over can be invoked by the following actions:

  • In the Manage Controllers/Agents view the user performs the operation: 
    • Controller -> Cluster Agents -> Subagent action menu: Disable


  • The Active Subagent is stopped normally from the command line:
    • agent.sh | .cmd stop
  • The operating system is shut down and systemd / init.d or a Windows Service are in place to stop the Subagent normally.

Switch-over happens within the period of time that is required to complete the currently running task. Further jobs will not be accepted for execution by the Subagent while waiting for completion of the current task.

Switch-back

Switch-back occurs exclusively when invoked by user intervention. Switch-back means that a previously inactive or unavailable Subagent becomes active and is considered for further execution of jobs in a Subagent Cluster.

Switch-back can be invoked by the following actions:

  • The user performs the following operation in the Manage Controllers/Agents view for a disabled Subagent: 
    • Controller -> Cluster Agents -> Subagent action menu: Enable



  • An unavailable Subagent is started from the command line:
    • agent.sh | .cmd start
  • The operating system is started and systemd / init.d or a Windows Service are in place to start the Subagent.

Further Resources