You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Scope

  • JobScheduler includes a number of measures to improve Agent high-availability.
  • Agent Bundles can be used to compensate the outage of a server that runs an Agent.
  • Master/Agent connection loss

Agent Bundles

Feature

  • JobScheduler supports a number of Agents to be specified for a single Process Class.
    • JobScheduler will contact Agents in a round-robin mode: 
      • the first Agent that is configured to execute jobs for the process class is contacted.
      • should  the first Agent not be available then the next Agent from the process class configuration will be used.
    • All Agents are running on different server nodes.
  • Delimitation
    • This feature is not intended for load sharing as JobScheduler will always use the first available Agent.
    • This feature is not intended for scalability as it does not allow to execute jobs in parallel on a number of Agents (clustering).

Issues

JS-1188 - Getting issue details... STATUS

Master / Agent Reconciliation

Scenario

  • Types of outages
    • Connection Loss
      • a recoverable, temporary connection loss for a configurable period of time, e.g. 20s.
    • Master Server Failure
      • an unrecoverable connection loss that takes more time than specified for the Connection Loss scenario or
      • a JobScheduler Master restart or server restart.
  • Supported scenarios
    • Master/Agent Reconciliation addresses the Connection Loss scenario, not the Master Server Failure scenario.

Feature

  • Reconciliation Scenario
    • applies after a Connection Loss between Master and Agent.
    • includes re-establishing the normal relationship between Master and Agent after an outage.
  • Agent Behavior
    • By default an Agent will kill any running tasks immediately if the connection to the Master gets lost, i.e. none of the above scenarios is supported. The reasons for this are:
      • If a Master were not available for a longer period then the Agent could not report back the execution history and log information for tasks. This would result in the fact that no information is available with the Master if the job execution has been successful or not.
      • The primary goal is to prevent duplicate execution of jobs. Without further information from a Master the respective Agent instance cannot know if later on it will be contacted for re-execution of the same job (which would allow to continue a currently running task on an Agent) or if the Master will choose a different Agent (see Agent Bundle).
    • With a Connection Loss setting configured with the process class the Agent will show the following behavior:
      • During the period specified for the tolerated connection loss duration the Agent will assume the Connection Loss scenario.
      • The Agent will continue any running tasks up to the end of the tolerated connection loss period.
        • If the connection between Master and Agent can be re-established during that period then reconciliation will take place.
        • Otherwise the Agent switches to the Master Server Failure scenario and kills any running tasks.
      • The behavior applies to tasks that are executed for a specific Master for which a connection has been lost. Tasks for other JobScheduler Master instances will be continued.
  • Master/Agent Reconciliation
    • After connection loss the Master will regularly attempt to re-establish the HTTP connection to the Agent. This communication includes a "tunnel" that allows the Agent to report the execution status of running jobs to the Master.
    • After a successful re-connect within the Connection Loss scenario the Master will repeat its request for execution of the respective jobs. Each new request includes an identifier for the previous execution reques that allows the Agent to identify repeated requests:
      • for a job that has been completed within the tolerated connection loss duration the Agent will report back the execution result to the Master.
      • for a job that is still running the Agent will report back the appropriate information an the Master will note the running tasks and update JOC accordingly.

Issues

Key Summary T Created Updated Due Assignee Reporter P Status Resolution
Loading...
Refresh

 

  • No labels