Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
outlinh1. true
outlinh1. true
1printablefalse
2stylh1. none
3indent20px

Functioning of the Watchdog (heart_beat_watchdog_thread)

A watchdog is started automatically each time a JobScheduler is started as part of a Cluster. Each watchdog runs as a seperate thread alongside its respective JobScheduler and monitors that JobScheduler's heartbeat. The Watchdog stops its JobScheduler if the JobScheduler's heartbeat is missing for a predefined length of time.

...

This behaviour cannot be configured as it is an "emergency" procedure to ensure the reliable functioning of the cluster.

Possible reasons for a missing heartbeat

  • Database problems
  • Problems with the SMTP mail server
  • DNS problems
  • A heavily overload computer (e.g. lack of memory)
  • A change in system time

Output to the log file scheduler.log

JobScheduler determines that its own heartbeat is missing 31 seconds after it was due. The warning is issued after a further delay of 3 seconds. The maximum delay that is tollerated is 55 seconds.

...

Code Block
 2013-09-12 12:28:20.546 [WARN]   (Cluster) 
 SCHEDULER-827  Own heart beat is late: next_heart_beat has been announced for 2013-09-12 12:27:03 
 (this is 77 seconds late)

See also

...