You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

Job Scheduler and Monitoring with Nagios

Introduction:
JobScheduler executes jobs and can, e.g. in case of an error, inform the persons that are responsible per e-mail. In an environment where Nagios is used to monitor the processes, it is recommended to use the method of advice that is provided by Nagios itself.
This document describes, how JobScheduler messages can be written directly into the Nagios console. For example:

  • In case of error
  • In case of success
  • Monitoring of single jobs and job chains
  • Monitoring if a particular job has run (successfully) at a specific time

System Environment
Nagios and JobScheduler can run on different servers. The communication takes place via the Nagios add-in NSCA. For the monitoring of Windows systems, a Shell script is called via SSH and writes directly into the command pipe of Nagios.

Preconditions

  • NSCA demon has to be installed on the Nagios server
  • NSCA client has to be installed on the Job Scheduler Server
  • Shell script for the description of the command pipe has to be installed on the Nagios Server (Windows.)

Job Scheduler Configurations
The following job chains have to be installed

  • nsca_communication - for communication with NSCA
  • send_nsca_nagios - calls send_nsca
  • set_order_state - marks the end status for the NSCA order

CheckJobRun - checks if a particular job has run.

  • error/ job_check_job_run - checks if a particular job has run successfully.
  • error/ add_nagios_alert - to generate an order for nsca_communication
  • sample_job_chain_with_errorHandling_at_the_end_by_nsca An example job chain
  • sample_order:_jobr sets an exit code
  • error/prepare_error.job sets the error message for the order to NSCA
  • error/ add_nagios_alert to create an order for nsca_communication
  • error/set_error sets the orders error state

Installation:
For the installation the file JobScheduler_nagios.tar.gz hat to be unzipped into the file scheduler/config/live. The following files are created:
The files:

  • nsca_communicationNSCA.job_chain.xml
  • set_order_state.job.xml
  • sendEvent2NagiosNSCA.job.xml

are copied into the directory:scheduler/config/live

Job Chain for Communication with Nagions (Windows) SSH
The files:

  • nagios_communicationSSH.job_chain.xml
  • set_order_state.job.xml
  • sendEvent2NagiosSSH.job.xml

are copied into the directory scheduler/config/live

Testing if a Job has run successfully
The files:

  • CheckJobRun.job_chain.xml
  • job_check_job_run.job.xml
  • add_nagios_alert.job.xml

are copied into the directory scheduler/config/live/error_handling

Error Treatment for a Job
The file handle_exit_code.js is copied into the directory scheduler/config/live/error_handling

Error description for ..... in a Job Chain
The files:

  • prepare_error.job.xml
  • set_error.job.xml

are copied into the directory scheduler/config/live/error_handling

Functionality:
Job Chain for Communication with Nagios (Linux) NSCA
This job chain triggers the call of the NSCA client.

The job sendEvent2NagiosNSCA calls the NSCA client parametrized. The job set_order_state sets the status of the order according to the parameter par_severity. Immediately the order history shows in the operations GUI the type of job that has been executed.

Job Chain for Communication with Nagios (Windows) SSH
This job chain triggers the call of the Shell script, which writes directly into the Nagios command pipe.
The job sendEvent2NagiosSSH is parametrised for the SSH connection. The job set_order_state sets the status of the order according to the parameter par_severity, . Immediately the order history shows in the operations GUI the type of job that has been executed.

Testing for the successful Run of a Job
The job chain CheckRun is used. An order is created for each job run that has to be checked.
As an example, an order has been implemented, this example checks if the job Job Run SampeJobNscaNotification has run successfully.

Error Treatment for a Job
In order to check a single job, post processing can be implemented. In this case a message is send to Nagios.

Error Treatments with Knots in a Job Chain

  • The job chain sample_job_chain_with_errorHandling_at_the_end_by_nsca shows error handling if the messages are send to Nagios.
  • The job sample_order_job2 simulates an error
    prepare error:
    sets the order parameter "par_message" to the value <jobkette>/<order_id> RC = <exit_code>

add_nagios_alert
adds an order to a job chain for communication with Nagios. Par_severity is set according to the Exit Code. This part can be individual for each customer.
If the parameter par_service is not set, then the service on Scheduler Errors, Scheduler Warnings or Scheduler Success is set according to the value of par_severity.

Set_error

  • Sets the errors status to error. This step takes care that erroneous orders are marked
  • Sets the state to error. This script is necessary to mark orders with errors also when the error handling was successful.

Nagios Configuration
Setting up a Host for each Job Scheduler
A host is set up for each server with a JobScheduler. If several Job Schedulers are installed on one host, one host configuration is used for all JobSchedulers.

  define host\{
        use                     generic-host            ; Name of host template to use
        host_name               yourHostName
        alias                   aliasHost
        address                 192.11.0.100
        check_command           check-host-alive
        max_check_attempts      10
        notification_interval   120
        notification_period     24x7
        notification_options    d,r
        contact_groups  admins
        \} 

Within the Nagios configuration services have to be set up
Service einrichten
A generic, re-usable service is set up. Here it is important to activate the passive checks and deactivate the active. Since defining the service is obligatory a dummy is inserted.

  define service\{
        use generic-service
        name passive_service
        active_checks_enabled 0
        passive_checks_enabled 1 # We want only passive checking
        flap_detection_enabled 0
        register 0 # This is a template, not a real service
        is_volatile 0
        check_period 24x7
        max_check_attempts 1
        normal_check_interval 5
        retry_check_interval 1
        check_freshness 0
        contact_groups admins 
        check_command check_dummy!0
        notification_interval 120
        notification_period 24x7
        notification_options w,u,c,r
        stalking_options w,c,u
        \}
  define command\{
   command_name check_dummy
   command_line $USER1$/check_dummy $ARG1$
  \}

Basically it is sufficient to set up one more service, which collects all the messages from JobScheduler. It is recommended to distribute the messages to several services, since otherwise messages of success, e.g. in case of error could be overwritten. For example, a simple split-up could be the separation according to the type of message (error, warning, success). It is also possible to set up a service for a particular job or job chain.

Messages for a particular Job or Job Chain

  define service\{
        use                       passive_service          
        host_name                 yourHostName
        service_description       Job Run SampeJobNscaNotification
        \}

Warnings, Errors and Success Messages

  define service\{
        use                     passive_service       
        host_name               yourHostName
        service_description     Scheduler Warnings
        \}
  define service\{
        use                      passive_service                
        host_name                yourHostName
        service_description      Scheduler Errors
        \}
  define service\{
        use                      passive_service          
        host_name                yourHostName
        service_description      Scheduler Messages

Script to write into the command pipe (only if Windows-Server are monitored)
If the messages from Job Scheduler (on Windows) are to be written into the Nagios console, then a Shell script has to be installed on the Nagios server. This script writes into the command pipe of Nagios. The script is executed via SSH.
Read more here:
Sources:

  • No labels