You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Scope

  • JobScheduler Master and Agents check availability of the communication partner by regularly sending heartbeats. 
  • Heartbeats are sent via the HTTP connection that is established by the Master to the Agent. Bi-directional heartbeats make use of this connection.
    • The Agent receives HTTP POST requests from the Master and will respond within 5s, independently from the completion of the command that has been requested by the Master.
    • The Master will repeat sending further HTTP POST requests and accepting acknowledgements until the Agent sends the final response, i.e. after completion of a task.
  • This allows Master and Agent to check if a connection has been lost and if it can be re-established.
  • FEATURE AVAILABILITY STARTING FROM RELEASE 1.10.2

Related Features

JS-1523 - Getting issue details... STATUS

JS-1524 - Getting issue details... STATUS

 

Use Case

Kill Tasks in case of Connection Loss

  • If the Agent receives no heartbeats from the Master within the double period that has been configured for this connection (default: 10s) then the Agent will 
    • assume the connection to be lost and
    • kill any running tasks that have been requested by that Master.
    • This behavior is intended to prevent simultaneous duplicate execution of tasks by an Agent. 
  • If the Master receives no heartbeats from the Agent then it will 
    • consider the task being lost, e.g. its request for execution of a task not to have been received by the Agent, and will assign the task an error state,
    • try to re-establish the connection to the Agent,
    • repeat the request for task execution if the connection to the Agent can be established.
  • In this situation the Agent will 
    • within a configurable grace period
      • continue any running tasks.
      • try to identify duplicate requests for task execution from the Master and drop duplicate requests if the task is running.
    • kill the running tasks if the grace period is exceeded.

Continue Tasks in case of Reconciliation

  • If the Master successfully re-connects to the Agent within the grace period then
    • running tasks will be continued and completed by the Agent.
    • the task status and execution result will be reported to the Master.
  • In case of reconciliation the task status, log information and execution result are available for the Master and are visible with JOC.

Configuration

  • The heartbeat settings can be configured with the Process Classes that specify the Agent connection. 
  • The configuration is located with the Master, no configuration items are stored with the Agent.

Settings

  • Heartbeat Period: http_heartbeat_period
    • The period after which the Agent sends a heartbeat to Master should no other HTTP operation on behalf of the Master be executed.
    • Default: 10s
  • Heartbeat Timeout: http_heartbeat_timeout
    • The overall timeout that determines if a connection is considered to be lost permanently.
    • Includes the heartbeat period and the delay after which the Master will send its heartbeat
    • Default: 15s

Example

keep-alive parameter
<?xml version="1.0" encoding="utf-8"?>
<process_class>
    <remote_schedulers>
        <remote_scheduler remote_scheduler="http://127.0.0.2:4445" http_heartbeat_period="10" http_heartbeat_timeout="5"/>
    </remote_schedulers>
</process_class>

Delimitation

  • Connection heartbeats tend to render the use of keep-alive packets superfluous, see Connection Keep-Alive for Master and Agent
  • Connection hearbeats are used to detect a connection loss and to re-establish a connection within short time.
    • They are not intended to cover longer network outages.
    • They are not intended for recovery scenarios, i.e. both Master and Agent have to be up and running. If one of the components is restarted then this is considered a recovery scenario.

References

Change Management References

Key Summary T Created Updated Due Assignee Reporter P Status Resolution Fix Version/s
Loading...
Refresh

Documentation

 

  • No labels