Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: First three sections revised

...

This article gives an overview of the JobScheduler Monitoring Interface. This interface solution provides an efficient means of of monitoring JobScheduler objects such as Jobs, Job Chains and Orders and forwarding notifications to system Monitors such as op5, Nagios or icing. This feature solution is available starting from with JobScheduler General Availability Release 1.8 onwards.

 

The most important features of this solution are:

  • JobScheduler: The architecture follows carries out a two - step process around the interface:
    • Detecting errors: A Job running at regular intervals - typically every 2 minutes - analyses the JobScheduler log information History Log information recorded by the JobScheduler in the database and notes a predefined set of information about the Job Scheduler objects being monitored. The information noted is typically whether tasks have been completed and whether had errors or warnings have been logged. This job then writes this information in a separate notifications Notifications database table.
    • Sending alerts: A second Job is responsible for sending the alerts to the relevant System Monitor. This job is also run at regular intervals, analyzing the notifications Notifications database tables. It then carries out a predefined action for each item it finds in the table. Typical actions would be informing a particular monitor that a particular type of event has occurred, such as the successful completion of a particular an order or one of a number of jobs job ending in error.
  • JobScheduler: This The solution architecture allows analysis of the Log History of more than one JobScheduler using the database specified. It may also be configured to monitor more that one database.
  • System Monitors: the JobScheduler is able to connect to more than one System Monitor at the same time.

Monitoring Definitions

The following definitions apply for the monitoring systems:

DefinitionDescription
System MonitorA System Monitor is an instrument to inform the a Service Desk (e.g. 1st Level Support) about incidents in IT systems. It does not serve for the analysis of the analyze incidents, but merely for the information about the incidents, in order to be able to forward and scale these informationsthis information.
Passive ChecksThese kind of checks are the ones that Passive checks are sent remotely from an external host (from the point of view of a the System Monitor) to the System Monitor. Otherwise, the ones checks that are carried out periodically by the System Monitor itself are called active checks.
AlertingAn Alert is an alarm, i.e. the a message about an event. An alert Alert does not provide every relevant all the information of about an event, but it informs about the existence of the event. An alert Alert can be either positive or negative.
Notification The notification of a specific alertAlert. Not every alert Notification will not be notifiedprovided for every Alert, just the ones that are so configured will be notified. Notifications are therefore a subset of the alerts Alerts and can also be either positive or negative too
Acknowledgment
Acknowledgement 
Is the confirmation of an alert and it has the meaning , that the alert has been seen and/or is well known and the incident is trying to be recoveredthat appropriate action is being taken. An acknowledgement acknowledgment is always manually executed, . This means that means, there is always someone that has realized there is a Critical service and this person acknowledges the services (usually by the Service Desk or 1st Level Support). It is never an automatized step. 

...

The benefits of the new solution are:

  1. There is no No changes have to be done in made to your existing JobScheduler configuration (Jobs, Job Chains, etc.) in order to get this solution working. You have to add the corresponding Job Chains required for the monitoring but do not have to modify your current ones.
  2. The whole architecture lies at on the JobScheduler side and the solution is then therefore independent from of the monitor that the alerts Alerts are sent to. The solution works for every monitor that can receive passive checks.
  3. Processing of Jobs and Job Chains in JobScheduler is not affected or modified by the monitoring, neither in sense from the point of view of performance nor in sense that of stability.
  4. The level of detail in a message of a Service in the System Monitor is much higher with this solutionThis solution makes very detailed information available for the System Monitors. JobScheduler logs very exact what the error is about exactly and this information is can be sent as a passive check Passive Check to the specific Service, which shows the log message that JobScheduler logged.relevant Monitoring Service if required.
  5. Errors of a critical nature are The criticality of an error is immediately recognized in the System Monitor. The JobScheduler has initially access to all the log information about errors and can be configured to filter this information is sorted out and sent to different Services in the System Monitor for every specific casevery exactly before forwarding it to the relevant System Monitor Service. Through this feature, the Service Desk is immediately able to set its priority for recovering errors. For example, it does not have the same Criticality to recover an error of Performance (low) than when Documents could not be generated (high). Here you go a representation of this featurepriorities when, for example, recovering errors. It is unlikely that a performance error would be given the same priority as an error in document processing. This feature is illustrated in the following diagram:

Functionality

FunctionalityDescription
Job Chain and Order MonitoringThis solution allows Job Chains in JobScheduler can to be monitored with the new solution. Actually, the elements that are monitored are the by way of the Orders that trigger these Job Chains.
History NotificationsNot only can critical alerts are be monitored, but also the positive ones. The history of a specific service is also can be monitored , to see exactly if a specific workflow was executed or not work-flow has been executed and what result it ended up withgave.
Performance measurement (Timer) There are also timers that Timers can be used to measure the performance of a Job Chain. In case it takes too long for a Job Chain to end, Chains. These can be used to send a warning alert will be sent to a System Monitor if a Job Chain takes more that a predefined time to complete.
AcknowledgmentOnce a service in the System Monitor is critical, there is the possibility to acknowledge this service. That action will add an Order Acknowledgments sent in response to critical alerts sent out by a System Monitor can be used to add Orders to the JobScheduler, so that the JobScheduler does not send more notifications about a service to the System Monitor for this service.

Monitoring sample - op5 Monitor

Here is an example of JobScheduler monitoring in op5 Monitor. There are 3 checks (in op5 Monitor they are called services) defined for the JobScheduler monitoring. Different Job Chains in JobSCheduler JobScheduler can send notifications to the same check, so that it is not necessary to create one check for each Job Chain, because that could produce a chaotic monitoring. Instead, we group results in three categories: 

...