Skip to end of metadata
Go to start of metadata

Introduction

This article gives an overview of the JobScheduler Monitoring Interface.The Monitoring Interface provides an efficient means for monitoring JobScheduler objects such as Jobs, Job Chains and Orders and forwarding notifications to System Monitors such as Nagios®. This solution is available with JobScheduler General Availability Release 1.8 onwards. 

 

The most important features of this solution are:

  • JobScheduler: carries out a two step process around the interface:
    • Detecting errors and other events: A Job running at regular intervals - typically every 2 minutes - analyses the History Log information recorded by the JobScheduler in the database. This job is configured not only to filter but also to analyze the log information for the Job Scheduler objects being monitored. The information noted is typically whether tasks have been successfully completed or whether errors or warnings have been logged. This job then writes this information in a separate Notifications database table.
      JITL-166 - JobScheduler Monitoring Interface should use batch insert for improved performance Released  
    • Sending alerts: A second Job is responsible for sending the alerts to the relevant System Monitor. This job is also run at regular intervals, analyzing the Notifications database tables. It then carries out a predefined action for each item it finds in the table. Typical actions would be informing a particular monitor that a particular type of event has occurred, such as the successful completion of an order, a job ending in error or whether error recovery is being attempted.
  • JobScheduler: The solution architecture allows analysis of the Log History of more than one JobScheduler using the database specified. It may also be configured to monitor more that one database.
  • System Monitors: the JobScheduler is able to connect to more than one System Monitor at the same time.

Monitoring Definitions

The following definitions apply for the monitoring systems:

DefinitionDescription
System MonitorA System Monitor is an instrument to inform a Service Desk (e.g. 1st Level Support) about incidents in IT systems. It does not analyze incidents, but merely information about incidents, in order to be able to forward and scale this information.
Passive ChecksPassive Checks are sent remotely from an external host (from the point of view of the System Monitor) to the Monitor. Otherwise, checks that are carried out periodically by the System Monitor itself are called Active Checks.
Active Checks

Active Checks are initiated from the System Monitor server and are performed on a regular basis, e.g. every 5 minutes. They are intended for simple verification of availability of a daemon/service, they do not provide information at application level, e.g. on the execution status of jobs.

Use of Active Checks is explained in How to perform active checks with a System Monitor such as Nagios/op5

AlertingAn Alert is a message about an event. An Alert does not provide all the information about an event, but it informs about the existence of the event. An Alert can be either positive or negative.
Notification The notification of a specific Alert. Notification will not be provided for every Alert, just the ones that are so configured will be notified. Notifications are therefore a subset of the Alerts and can also be either positive or negative. 
Acknowledgment
 
Is the confirmation of an alert and it has the meaning that the alert has been seen and/or is known and that appropriate action is being taken. An acknowledgment is always manually executed. This means that there is always someone that has realized there is a Critical service and this person acknowledges the services (usually by the Service Desk or 1st Level Support). It is never an automatized step. 

 

Benefits

The benefits of the new solution are:

  1. Flexible implementation:
    Changes to your existing JobScheduler configuration (Jobs, Job Chains, etc.) are not required to get this solution working. You add the Job Chains required for the monitoring but do not have to modify your current ones.
  2. Monitor independence:
    The whole architecture lies on the JobScheduler side and the solution is therefore independent of the monitor that the Alerts are sent to. The solution works for every monitor that can receive passive checks.
  3. Workload-independent:
    Processing of Jobs and Job Chains in JobScheduler is not affected or modified by the monitoring, neither from the point of view of performance nor that of stability.
  4. Clearly defined information flow:
    This solution allows the information to be made available to the System Monitors to be exactly configured. Detailed log information from monitored Job Chains can be sent as a Passive Check to the relevant Monitoring Service if required.
  5. Error Prioritization
    Errors of a critical nature are immediately recognized in the System Monitor. The JobScheduler has initially access to all the log information and can be configured to filter this information very exactly before forwarding it to the relevant System Monitor Service. This feature allows the Service Desk to be able to set priorities immediately when, for example, recovering errors: it is unlikely that a performance error would be given the same priority as an error in document processing. This feature is illustrated in the following diagram:

Functionality

FunctionalityDescription
Job Chain and Order MonitoringThis solution allows Job Chains to be monitored by way of the Orders that trigger these Job Chains.
History NotificationsNot only can critical alerts be monitored, but also positive ones. The history of a specific service can be monitored to see exactly if a specific work-flow has been executed and what result it gave.
Performance measurement (Timer) Timers can be used to measure the performance of Job Chains. These can be used to send a warning alert to a System Monitor if a Job Chain takes more that a predefined time to complete.
AcknowledgmentAcknowledgments sent in response to critical alerts sent out by a System Monitor can be used to add Orders to the JobScheduler, so that the JobScheduler does not send more notifications about a service to the System Monitor.

Monitoring example - op5® Monitor

The following example illustrates use of the JobScheduler Monitoring interface with the op5® Monitor. In the example, three checks (in op5® Monitor they are called services) have been defined for the JobScheduler monitoring. Different Job Chains in JobScheduler can send notifications to the same check, so that it is not necessary to create checks for each individual Job Chain, which could become extremely complex. Instead, results have been grouped in three categories: 

  • JobScheduler Monitoring Errors: Job Chains that end with an error are sent to this service. The last error notification is shown in the column "STATUS INFORMATION".
  • JobScheduler Monitoring Success: Job Chains that end with success, that is with a positive notification, are sent to the monitoring system. To be exact, the history of a specific Job Chain is monitored to see whether a specific work-flow has been executed or not. The last success notification is shown in the column "STATUS INFORMATION".
  • JobScheduler Monitoring Performance: Here timers are used to measure the performance of a Job Chain. If a Job Chain takes too long to end, a warning alert will be sent to the System Monitor. The information about the expired timer is shown in the column "STATUS INFORMATION".

op5 Monitor - Services for JobScheduler monitoring

Change Management References

Loading
T Key Linked Issues Fix Version/s Status P Summary Updated
Feature JS-1600 1.9.11, 1.10.4, 1.11 Released Minor Monitoring Interface is capable of routing Return Codes Apr 22, 2016
Feature JS-1446 JS-1445 2.0 Clarify Minor Log entries that are not dependent from JobScheduler objects should be available for JobScheduler Monitoring Interface Dec 08, 2017
Feature JS-1388 1.12 Released Major Add e-mail as a monitor service for the JobScheduler Monitoring Interface Dec 21, 2017
Feature JS-684 JS-1291 , JS-1589 , JS-1426 , JS-1410 , JS-1480 1.10 Released Major System Monitor (Nagios, op5) should notify if a JobScheduler Universal Agent is not available Feb 10, 2016
Fix JITL-435 1.11.6, 1.12 Released Minor JobScheduler Monitoring Interface should handle next steps after sending the recovery message Jan 30, 2018
Feature JITL-427 1.11.6, 1.12 Released Minor JobScheduler Montitorig Interface. Performance improvement for SystemNotifier job. Feb 01, 2018
Feature JITL-401 1.11.5, 1.12 Released Minor JobScheduler Monitoring Interface should support the job timers Nov 11, 2017
Feature JITL-400 1.11.5, 1.12 Released Minor JobScheduler Monitoring Interface should support standalone jobs Nov 11, 2017
Feature JITL-353 1.11.2, 1.12 Released Minor Monitoring Interface should recognize the configured JobChain/Job/TimerJobChain with the leading "/" character Dec 21, 2017
Feature JITL-352 1.11.2, 1.12 Released Minor Add Cluster member identification to Monitoring Interface Dec 21, 2017
Feature JITL-351 1.11.2, 1.12 Released Minor Add Agent identification to Monitoring Interface Dec 21, 2017
Feature JITL-280 1.12 Released Minor Add JMS support to the JobScheduler Monitoring Interface Dec 21, 2017
Feature JITL-264 JS-1132 , JS-1562 2.0 Deferred Minor Add selected job and order parameters to the JobScheduler Reporting Interface Nov 13, 2017
Feature JITL-230 1.9.8, 1.10.2, 1.11 Released Minor JobScheduler Monitoring Interface jobs should consider updated Timer elements Dec 17, 2015
Feature JITL-166 1.9 Released Minor JobScheduler Monitoring Interface should use batch insert for improved performance Dec 10, 2015

See also