Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Jira
serverSOS JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId6dc67751-9d67-34cd-985b-194a8cdc9602
keyJS-1524

Concepts

  • Heartbeat Period: 
    • The period after which the Agent sends a heartbeat to Master should no other HTTP operation on behalf of the Master be executed.
    • Default: 10s
  • Heartbeat Timeout: 
    • The overall timeout that determines if a connection is considered to be lost permanently.
    • Includes the heartbeat period and the delay after which the Master will send its heartbeat.
    • Default: 60s
  • Heartbeat Delay:
    • The delay that the JobScheduler 

Use Case

Kill Tasks in case of Connection Loss

  • If the Agent receives no heartbeats from the Master within the double period that has been configured for this connection (default: 10s) 120 seconds then the Agent will 
    • assume the connection to be lost and
    • kill any running tasks that have been requested by that Master.
    • This behavior is intended to prevent simultaneous duplicate execution of tasks by an Agent. 
  • If the Master receives no heartbeats from the Agent then within the interval between 50 and 60 seconds then it will 
    • consider the task being lost, e.g. its request for execution of a task not to have been received by the Agent, and will assign the task an error state,
    • try to re-establish the connection to the Agent,
    • repeat the request for task execution if the connection to the Agent can be established.
  • In this situation the Agent will 
    • within a configurable grace period
      • continue any running tasks.
      • try to identify duplicate requests for task execution from the Master and drop duplicate requests if the task is running.
    • kill the running tasks if the grace period is exceeded.

...

  • Heartbeat Period: http_heartbeat_period
    • The period after which the Agent sends a heartbeat to Master should no other HTTP operation on behalf of the Master be executed.
    • Default: 10s
  • Heartbeat Timeout: http_heartbeat_timeout
    • The overall timeout that determines if a connection is considered to be lost permanently.
    • Includes the heartbeat period and the delay after which the Master will send its heartbeat.
    • Default: 60s

Example

Code Block
languagexml
titlekeep-alive parameter
<?xml version="1.0" encoding="utf-8"?>
<process_class>
    <remote_schedulers>
        <remote_scheduler remote_scheduler="http://127.0.0.2:4445" http_heartbeat_period="10" http_heartbeat_timeout="60"/>
    </remote_schedulers>
</process_class>

...