Deprecation Announcement

This feature is deprecated as it is replaced by the JobScheduler Monitoring Interface - Overview. The JobScheduler Monitoring Interface provides better integration with Nagios without modifications to your jobs and job chains, e.g. providing recovery messages, performance checks and individual routing of job error messages to specific Nagios services. For use of active checks see How to perform active checks with a System Monitor such as Nagios/op5

FEATURE AVAILABILITY ENDING WITH RELEASE 1.10

JITL-143 - Getting issue details... STATUS

Job Scheduler and Monitoring with Nagios

Introduction

JobScheduler executes jobs and can, for example, inform the responsible persons per e-mail in the event of an error. In an environment where Nagios is used to monitor the processes, it is recommended that the method of notification provided by Nagios itself is used.
This document describes how JobScheduler messages can be written directly into the Nagios console.
For example:

  • In case of error
  • In case of success
  • Monitoring of single jobs and job chains
  • Monitoring whether a particular job has run (successfully) at a specific time

System Environment

Nagios and JobScheduler can run on different servers.
The communication takes place via the Nagios NSCA add-in.
When monitoring Windows systems, a Shell script is called via SSH and writes directly to the Nagios command pipe file.

Preconditions

  • NSCA demon has to be installed on the Nagios server
  • NSCA client has to be installed on the Job Scheduler Server
  • A shell script for the description of the command pipe has to be installed on the Nagios Server (Windows.)

Job Scheduler Configuration

The following job chains have to be installed

  • nsca_communication - for communication with NSCA
  • send_nsca_nagios - calls send_nsca
  • set_order_state - marks the end status for the NSCA order

CheckJobRun - check whether a particular job has run.

  • error/ job_check_job_run - checks if a particular job has run successfully.
  • error/ add_nagios_alert - generates an order for nsca_communication
  • sample_job_chain_with_errorHandling_at_the_end_by_nsca An example job chain
  • sample_order:_jobr sets an exit code
  • error/prepare_error.job sets the error message for the order to NSCA
  • error/ add_nagios_alert to create an order for nsca_communication
  • error/set_error sets the orders error state

Installation:

The JobScheduler_nagios.tar.gz file has to be unzipped into the scheduler/config/live folder.

The following files are created and copied into the scheduler/config/live/nagios directory:

  • nsca_communicationNSCA.job_chain.xml
  • set_order_state.job.xml
  • sendEvent2NagiosNSCA.job.xml

Job Chain for Communication with Nagios (Windows) SSH:

.The following files are copied into the scheduler/config/live/nagios directory:

  • nagios_communicationSSH.job_chain.xml
  • set_order_state.job.xml
  • sendEvent2NagiosSSH.job.xml

Testing if a Job has run successfully

The following files are copied into the scheduler/config/live/nagios/error directory:

  • CheckJobRun.job_chain.xml
  • job_check_job_run.job.xml
  • add_nagios_alert.job.xml

Error Treatment for a Job

The handle_exit_code.js file is copied into the directory scheduler/config/live/nagios/error

Error description for ..... in a Job Chain

The following files are copied into the scheduler/config/live/nagios/error directory:

  • prepare_error.job.xml
  • set_error.job.xml

Functionality:

Job Chain for Communication with Nagios (Linux) NSCA:

This job chain triggers the call of the NSCA client.

The sendEvent2NagiosNSCA job makes a parameterised call to the NSCA client.
The 'set_order_state_job sets the status of the order according to the par_severity parameter._
The order history will then show the type of job that has been executed in the operations GUI.

Job Chain for Communication with Nagios (Windows) SSH

This job chain triggers the Shell script call, which writes directly to the Nagios command pipe.
The sendEvent2NagiosSSH job is parameterised for the SSH connection.
The set_order_state job sets the status of the order according to the par_severity parameter.
The order history will then show the type of job that has been executed in the operations GUI.

Testing for a successful job run

This uses the CheckRun job chain. An order is created for each job run that has to be checked.
An order has been implemented as an example. This example checks if the Job Run SampeJobNscaNotification job has run successfully.

Error Treatment for a Job

Post processing can be implemented to check individual jobs. In this case a message is send to Nagios.

Error Treatments with Nodes in a Job Chain

  • The sample_job_chain_with_errorHandling_at_the_end_by_nsca job chain shows error handling if the messages are send to Nagios.
  • The sample_order_job2 job simulates an error

prepare error: sets the order parameter "par_message" to the value <jobkette>/<order_id> RC = <exit_code>
add_nagios_alert: adds an order to a job chain for communication with Nagios.
Par_severity is set according to the Exit Code. This part can be individually set for each customer.
If the par_service parameter is not set, then the on Scheduler Errors, Scheduler Warnings or Scheduler Success service is set according to the value of par_severity.

Set_error

  • Sets the errors status to error. This step ensures that erroneous orders are marked correctly
  • Sets the state to error. This script is necessary to mark orders with errors although the error handling itself was successful.

Nagios Configuration

Setting up a Host for each JobScheduler:

Sets up a host for each server running a JobScheduler. If several JobSchedulers are installed on a host, the same host configuration is used for all JobSchedulers.

  define host {
        use                     generic-host            ; Name of host template to use
        host_name               yourHostName
        alias                   aliasHost
        address                 192.11.0.100
        check_command           check-host-alive
        max_check_attempts      10
        notification_interval   120
        notification_period     24x7
        notification_options    d,r
        contact_groups  admins
        } 

Services have to be set up within the Nagios configuration. A generic, re-usable service is set up. Here it is important to activate the passive checks and deactivate the active ones. The obligatory service definition is made with a dummy.

  define service {
        use generic-service
        name passive_service
        active_checks_enabled 0
        passive_checks_enabled 1  # We want only passive checking
        flap_detection_enabled 0
        register 0                # This is a template, not a real service
        is_volatile 0
        check_period 24x7
        max_check_attempts 1
        normal_check_interval 5
        retry_check_interval 1
        check_freshness 0
        contact_groups admins 
        check_command check_dummy!0
        notification_interval 120
        notification_period 24x7
        notification_options w,u,c,r
        stalking_options w,c,u
        }
  define command {
   command_name check_dummy
   command_line $USER1$/check_dummy $ARG1$
  }

Whilst it is sufficient to set up a single service to collects all the messages from JobScheduler,
we recommended that messages are distributed to several services so that, for example,
success messages can be overwritten in the event of error.
One possible criterion for dividing messages is the message type (error, warning, success).
It is also possible to set up individual services for particular jobs or job chains.

Messages for a particular Job or Job Chain

  define service {
        use                       passive_service          
        host_name                 yourHostName
        service_description       Job Run SampeJobNscaNotification
        }


Warnings, Errors and Success Messages

  define service {
        use                     passive_service       
        host_name               yourHostName
        service_description     Scheduler Warnings
        }
  define service {
        use                      passive_service                
        host_name                yourHostName
        service_description      Scheduler Errors
        }
  define service {
        use                      passive_service          
        host_name                yourHostName
        service_description      Scheduler Messages


Script to write into the command pipe (only if Windows-Servers are monitored)

Note that a a Shell script has to be installed on the Nagios server if messages from Job Scheduler (on Windows) are to be written to the Nagios console. This script will write to the Nagios command pipe and is executed via SSH.

Read more here:
Sources: