You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 170 Next »

Introduction

This article describes individual configuration parameters and provides examples of their use with monitors such as op5 and Zabbix.

Configuration Editor

We recommend that the XML Editor is used generate monitoring configuration objects. This editor automatically uses an XSD Schema to generate configuration suggestions and validate configurations, and its use is intended to provide a significant reduction in the time required to develop and test a configuration.

Configuration

JobScheduler

 - SystemMonitorNotification files

Location: <scheduler_install>/config/notification

FileDescription
SystemMonitorNotification_v1.0.xsd

The XML Schema file defines which values are allowed in your XML files for the JobScheduler monitoring.

That means that to configure the JobScheduler objects you want to monitor and the System Monitor you just have to modify your SystemMonitorNotification_<MonitorSystem>.xml files but not the XML schema file.

SystemMonitorNotification_<MonitorSystem>.xml

 Configuration file for each System Monitor.

  • Specifies the delivery way to System Monitor.
  • Specifies notification for error or success conditions
  • Specifies notification to measure performance of JobScheduler objects
 

SystemMonitorNotificationTimers.xml

Configuration file for all System Monitors.

  • Specifies notification to measure performance of JobScheduler objects

This file is optional and contains the definitions of the SystemMonitorNotification / Timer elements.

 

SystemMonitorNotification Elements

The configuration element descriptions are organized into the following major categories:

ElementElement descriptionDescription
SystemMonitorNotificationTop Level ElementConfiguration for notifications to be sent to a system monitor.
NotificationOnce or more inside a SystemMonitorNotification elementSpecifies a system monitor notification that includes a command line invocation and the JobScheduler objects.
TimerOptional, once or more inside a SystemMonitorNotification elementPerformance measurement definition.
SystemMonitorNotification

JITL-230 - Getting issue details... STATUS

SystemMonitorNotification supports the following attributes:

Note:

  • attribute system_id 
    • in case of the SystemMonitorNotificationTimers.xml the value of this attribute is not important and can have any value.

      • e.g.: timers

AttributeUsageDescription
system_idrequired

System Monitor identifier.

See JobScheduler - Job Chains customization 

Example
<SystemMonitorNotification system_id="op5">
...


SystemMonitorNotification / Notification

The following elements may be nested inside a Notification element:

ElementElement descriptionDescription
NotificationMonitorOnce inside a Notification elementSpecifies the System Monitor interface that is being used for messages: either by a Plug-in Interface or by command line invocation
NotificationObjectsOnce inside a Notification elementSpecifies the Job Chain and the Timer definitions
SystemMonitorNotification / Notification / NotificationMonitor

NotificationMonitor supports the following attributes:

Note:

  • attributes service_name_on_error and service_name_on_success
    • at least one of these attributes must be configured
    • both attributes can be configured together
AttributeUsageDescription
service_name_on_errorOptionalThis setting specifies the service that is configured in the Service Monitor for messages of job runs with errors and for job recovery messages. The service name must match the corresponding setting in the System Monitor.
service_name_on_successOptionalThis setting specifies the service that is configured in the Service Monitor for receiving informational messages on successful job runs. The service name must match the corresponding setting in the System Monitor
service_status_on_errorOptional

This setting specifies the service status code for error messages.

Default: CRITICAL

service_status_on_successOptional

This setting specifies the service status code for success messages

Default: OK

Example
<!-- Example 
op5 NSCA Status: 
0 - OK 
1 - WARNING 
2 - CRITICAL 
3 - UNKNOWN --> 
... 
<!-- Sending occurred errors as CRITICAL (default) --> 
<NotificationMonitor service_name_on_error="JobScheduler Monitoring Errors"> 
... 
<!-- Sending occurred errors as WARNING --> 
<NotificationMonitor service_name_on_error="JobScheduler Monitoring Errors" service_status_on_error="1"> 
...

One of the following elements must be nested inside a NotificationMonitor element:

ElementElement descriptionDescription
NotificationInterfaceOptional, once inside of NotificationMonitor elementPlug-in Interface to be executed for System Monitor notification
NotificationCommandOptional, once inside of NotificationMonitor elementCommand line to be executed for System Monitor notification

 

SystemMonitorNotification / Notification / NotificationMonitor / NotificationInterface

NotificationInterface support the following attributes:

AttributeUsageDescription
monitor_hostRequiredThis setting specifies the host name or ip address of System Monitor host.
monitor_portRequiredThis setting specifies the TCP port that the System Monitor would listen to.
monitor_passwordOptional

This setting specifies the password

  • for NSCA - password configured in the ncsa.cfg file.
monitor_connection_timeoutOptional

This setting specifies the connection timeout in ms.

Default: 5000

monitor_response_timeoutOptionalThis setting specifies the response timeout in ms.
monitor_encryptionOptional

This setting specifies that the communication with the System Monitor is encrypted. By default no encryption is used.

  • NONE             - no encryption
  • XOR            - XOR encryption
  • TRIPLE_DES - use of triple des algorithm for encryption
service_hostRequiredThis setting specifies the name of the host that executes the passive check. The name must match the corresponding setting in the System Monitor.
pluginOptional

Default:

    • JobScheduler version 1.9.x, 1.10.x

      • com.sos.scheduler.notification.plugins.notifier.SystemNotifierSendNscaPlugin
    • JobScheduler version 1.11.x

      • com.sos.jitl.notification.plugins.notifier.SystemNotifierSendNscaPlugin
Example
...
<NotificationInterface monitor_host="monitor_host" monitor_port="5667" monitor_encryption="XOR" service_host="service_host"><![CDATA[
scheduler id=${MON_N_SCHEDULER_ID}, history id=${MON_N_ORDER_HISTORY_ID}, job_chain=${MON_N_JOB_CHAIN_NAME}(${MON_N_ORDER_ID}), step =${MON_N_ORDER_STEP_STATE}, error=${MON_N_ERROR_TEXT}
]]></NotificationInterface>
...

System Monitor: Opsview

In case you are using Opsview as the monitoring tool, the plugin used in NotificationInterface will not work, since Opsview supports 20 types of encryption, which are not supported by this plugin. You could only use it in case you use no encryption (type: NONE) and that will work.

Instead, you should use the XML element NotificationCommand and indicate there the exact command to send passive checks to your Opsview from a remote machine (see example for op5 or example for zabbix).

 

SystemMonitorNotification / Notification / NotificationMonitor / NotificationCommand

NotificationCommand support the following attributes:

AttributeUsageDescription
pluginOptional

Default:

  • JobScheduler version 1.9.x, 1.10.x 
    • com.sos.scheduler.notification.plugins.notifier.SystemNotifierProcessBuilderPlugin
  • JobScheduler version 1.11.x

    • com.sos.jitl.notification.plugins.notifier.SystemNotifierProcessBuilderPlugin


Example
...
<NotificationCommand><![CDATA[
echo scheduler id=${MON_N_SCHEDULER_ID}, history id=${MON_N_ORDER_HISTORY_ID}, job_chain=${MON_N_JOB_CHAIN_NAME}(${MON_N_ORDER_ID}), step =${MON_N_ORDER_STEP_STATE}, error=${MON_N_ERROR_TEXT} > D://errors.txt
]]></NotificationCommand>
...
SystemMonitorNotification / Notification / NotificationObjects

One of the following elements must be nested inside a NotificationObjects element:

ElementElement descriptionDescription
Job Optional, once or more inside of NotificationObjects element  Restricts notifications for order jobs 
JobChainOptional, once or more inside of NotificationObjects elementRestricts notifications for job chains
TimerRefOptional, once or more inside of NotificationObjects elementRestricts notifications for performance checks (Timer)
Example
<SystemMonitorNotification system_id="op5"> 
  <Notification> 
    <NotificationMonitor service_name_on_error="JobScheduler Monitoring Errors"> 
      ... 
    </NotificationMonitor> 
    <NotificationObjects>
      <!-- Send the job error, occurrent in the "test/my_job" order job, to the "JobScheduler Monitoring Errors" service. -->        
      <Job name="test/my_job" /> 
      <!-- Send the job chain error, occurrent in the "test/my_jobchain" job chain, to the "JobScheduler Monitoring Errors" service. --> 
      <JobChain name="test/my_jobchain" /> 
    </NotificationObjects> 
 </Notification> 
</SystemMonitorNotification>  

 

SystemMonitorNotification / Notification / NotificationObjects / Job

Job supports the following attributes:

AttributeUsageDescription
notifications

Optional

Integer

Specifies the number of notifications that are sent to a System Monitor.

Default: 1

scheduler_idOptional

Notifications are restricted to the JobScheduler instance with the given identification. By default notifications will be sent for all JobScheduler instances that log into the same database.

Regular expression can be used.

nameOptional

Job name including possible folder names.

Regular expression can be used.

return_code_from Optional Restricts notifications for jobs for a particular return code range. 
return_code_from Optional  Restricts notifications for jobs for a particular return code range.  
Example
...
<Job notifications="2" name="test/my_job"/>
...
<Job scheduler_id="scheduler_4444" />
...
<Job scheduler_id="scheduler_4444" name="^(test/my)" />
... 
<Job name="test/my_job" return_code_from="5"/> 
...  
<Job name="test/my_job" return_code_to="10"/>
...  
<Job name="test/my_job" return_code_from="5" return_code_to="5"/>  
...

 

SystemMonitorNotification / Notification / NotificationObjects / JobChain

JobChain supports the following attributes:

AttributeUsageDescription
notifications

Optional

Integer

Specifies the number of notifications that are sent to a System Monitor.

Default: 1

scheduler_idOptional

Notifications are restricted to the JobScheduler instance with the given identification. By default notifications will be sent for all JobScheduler instances that log into the same database.

Regular expression can be used.

nameOptional

Job chain name including possible folder names.

Regular expression can be used.

return_code_from Optional Restricts notifications for job chains  for a particular return code range. 
return_code_from Optional  Restricts notifications for job chains for a particular return code range.  
step_fromOptionalRestricts notifications for job chains to a sequence of job nodes that are specified with the step_from and step_to attributes.
step_toOptionalRestricts notifications for job chains to a sequence of job nodes that are specified with the step_from and step_to attributes.
excluded_stepsOptionalSpecifies the steps which will be excluded from the analyzing (separated by semicolon)
Example
...
<JobChain notifications="2" name="test/my_jobchain"/>
...
<JobChain scheduler_id="scheduler_4444" />
...
<JobChain scheduler_id="scheduler_4444" name="^(test/my)" />
... 
<JobChain name="test/my_jobchain" return_code_from="5"/> 
... 
<JobChain name="test/my_jobchain" return_code_to="10"/>
... 
<JobChain name="test/my_jobchain" return_code_from="5" return_code_to="5"/>  
...
<JobChain name="test/my_jobchain" step_from="200"/>
...
<JobChain name="test/my_jobchain" step_to="500"/>
...
<JobChain name="test/my_jobchain" step_from="300" step_to="300"/>
...
<JobChain name="test/my_jobchain" excluded_steps="200;300"/>
...

 

SystemMonitorNotification / Notification / NotificationObjects / TimerRef

TimerRef supports the following attributes:

AttributeUsageDescription
notifications

Optional

Integer

Specifies the number of notifications that are sent to a System Monitor.

Default: 1

refOptionalCorresponds with Timer name setting defined in the SystemMonitorNotification / Timer element
notify_on_error

 

Optional

Boolean

Send timer check notification when the configured job chain contains the error notifications.

Default: false

Example
<SystemMonitorNotification system_id="op5"> 
  <Notification> 
    <NotificationMonitor service_name_on_error="JobScheduler Monitoring Error"> 
      ... 
    </NotificationMonitor> 
    <NotificationObjects> 
     <!-- 
     Send the job chain error, occurring in the "test/my_jobchain" job chain, to the "JobScheduler Monitoring Errors" service. 
     --> 
     <JobChain name="test/my_jobchain" /> 
    </NotificationObjects> 
  </Notification>   
 
  <Notification> 
    <NotificationMonitor service_name_on_error="JobScheduler Monitoring Performance"> 
      ... 
    </NotificationMonitor> 
    <NotificationObjects> 
      <!-- 
      Sends the performance check error, occurring in the "test/my_jobchain" job chain, to the "JobScheduler Monitoring Performance" service. 
      Sends the performance check error to the "JobScheduler Monitoring Performance" service will be ignored when the "test/my_jobchain" has the job chain error (default notify_on_error = false). 
      --> 
      <TimerRef ref="my_timer" /> 
    </NotificationObjects> 
 </Notification>   
 
 <Timer name="my_timer"> 
    <TimerJobChain name="test/my_jobchain" /> 
 </Timer> 
</SystemMonitorNotification> 

 

SystemMonitorNotification / Timer 

The following elements must be nested inside a Timer element:

ElementElement descriptionDescription
TimerJobChainOnce or more inside of Timer elementRestricts notifications for job chains
MinimumOptional or once inside of Timer elementMinimum required execution time for job chains or selected job nodes. Allows script code to be executed that returns the minimum execution time in seconds.
MaximumOptional or once inside of Timer elementMaximum allowed execution time for job chains or selected job nodes. Allows script code to be executed that returns the maximum execution time in seconds.
Example
<SystemMonitorNotification system_id="op5"> 
  ... 
  <Timer name="my_timer_1"> 
    <TimerJobChain name="test/my_jobchain_1" /> 
    <Maximum><Script language="javascript"><![CDATA[1000]]></Script></Maximum> 
  </Timer> 
 
  <Timer name="my_timer_2"> 
    <TimerJobChain name="test/my_jobchain_2" /> 
    <TimerJobChain name="test/my_jobchain_3" /> 
    <Minimum><Script language="javascript"><![CDATA[500]]></Script></Minimum> 
    <Maximum><Script language="javascript"><![CDATA[1000]]></Script></Maximum> 
  </Timer> 
</SystemMonitorNotification> 

Timer support the following attributes:

AttributeUsageDescription
 nameRequired

Corresponds to Timer used in the SystemMonitorNotification / Notification / NotificationObjects / TimerRef element.

The name must be unique across all timers definitions.

Example
...
<Timer name="my_timer">
... 

 

SystemMonitorNotification / Timer / TimerJobChain

TimerJobChain support the following attributes:

AttributeUsageDescription
 scheduler_idOptional

Notifications are restricted to the JobScheduler instance with the given identification. By default notifications will be sent for all JobScheduler instances that log into the same database.

Regular expression can be used.

 nameOptional

Job chain name including possible folder names.

Regular expression can be used.

 step_fromOptionalRestricts checks for job chains to a sequence of job nodes that are specified with the step_from and step_to attributes.
 step_toOptionalRestricts checks for job chains to a sequence of job nodes that are specified with the step_from and step_to attributes.
Example
...
<TimerJobChain scheduler_id="scheduler_4444" /> 
... 
<TimerJobChain scheduler_id="scheduler_4444" name="^(test/my)" /> 
... 
<TimerJobChain name="test/my_jobchain" step_from="200"/> 
... 
<TimerJobChain name="test/my_jobchain" step_to="500"/> 
... 
<TimerJobChain name="test/my_jobchain" step_from="300" step_to="300"/>
...

 

SystemMonitorNotification / Timer / Minimum

The following elements must be nested inside a Minimum element:

ElementElement descriptionDescription
ScriptOnce inside of Minimum elementScript code in one of the supported languages
Example
...
<Timer name="my_timer">
  ...
  <Minimum><Script language="javascript"><![CDATA[1000]]></Script></Minimum>
</Timer>
... 

 

SystemMonitorNotification / Timer / Maximum

The following elements must be nested inside a Maximum element:

ElementElement descriptionDescription
ScriptOnce inside of Maximum elementScript code in one of the supported languages
Example
...
<Timer name="my_timer">
  ...
  <Maximum><Script language="javascript"><![CDATA[1000]]></Script></Maximum>
</Timer>
... 

 

SystemMonitorNotification / Timer / Minimum|Maximum / Script

Script supports the following attributes:

AttributeUsageDescription
 languageRequired

Script language name

Supported languages:

  • javascript
  • ECMAScript 

 The Script element can contain:

  • a fixed value
  • a calculation based on the job/order parameters
Fixed value

A fixed value is the time allowed in seconds for the specific Minimum or Maximum definition

Example (fixed value)
...
  <Script language="javascript"><![CDATA[1000]]></Script>
...
Calculation

The calculation is to result in the time in seconds for the specific Minimum or Maximum definition.

This example calculates the execution time depending on the %file_size% parameter that was set by a specific job (see the example below)´.

Example (calculation)
...
  <Script language="javascript"><![CDATA[                     
    function my_calculate(){                         
      var fileSize              = new java.lang.Double(%file_size%);                         
      var timerExpiryFactor     = 0.0025;                         
      var timerExpiryTolerance  = timerExpiryFactor*0.1;                         
      var timerExpiry           = new java.lang.Double(timerExpiryFactor+timerExpiryTolerance);                         
      timerExpiry               = timerExpiry*fileSize;                     
      return timerExpiry;                     
    }                         
    my_calculate();
  ]]></Script>
...

 

This example job calculates and creates a new order parameter file_size.

To store the parameters into database (table SCHEDULER_MON_RESULTS) :

  • set the scheduler_notification_result_parameters parameter (see job documentation jobs/JobSchedulerNotificationStoreResultsJob.xml)
  • set the StoreResultsJobJSAdapterClass as monitor
    • JobScheduler version 1.9.x, 1.10.x
      • com.sos.scheduler.notification.jobs.result.StoreResultsJobJSAdapterClass
    • JobScheduler version 1.11.x

      • com.sos.jitl.notification.jobs.result.StoreResultsJobJSAdapterClass
Example (job)
<?xml version="1.0" encoding="ISO-8859-1"?> 
<job  title="Sample Job with Store Result Monitor" order="yes" stop_on_error="no" tasks="1">     
  <params>
     <!--
     set the scheduler_notification_result_parameters parameter
     -->         
    <param name="scheduler_notification_result_parameters" value="file_size"/>     
  </params>     
  
  <!--
  calculate and create the new order parameter if necessary
  -->
  <script language="javascript"><![CDATA[             
      function spooler_process(){                                  
        var order    = spooler_task.order;                 
        var params   = spooler.create_variable_set();                 
        params.merge(spooler_task.params);                 
        params.merge(order.params);                      
        
        // parameter scheduler_file_path was set in the previous job chain step
        var file     = new java.io.File(params.value("scheduler_file_path"));                 
        var fileSize = file.length()/1024;                 
        order.params.set_var("file_size",fileSize.toString());                          
      return true;             
      }]]>     
   </script>          
 
   <!-- 
   set the StoreResultsJobJSAdapterClass as a monitor
   -->     
   <monitor  name="notification_monitor" ordering="1">         
     	<!-- JobScheduler version 1.9.x, 1.10.x -->
	<script java_class="com.sos.scheduler.notification.jobs.result.StoreResultsJobJSAdapterClass" language="java"/>   
        <!-- JobScheduler version 1.11.x --> 
        <!--
        <script java_class="com.sos.jitl.notification.jobs.result.StoreResultsJobJSAdapterClass" language="java"/> 
        --> 
   </monitor>

   <run_time /> 
</job> 

Message

Usage

The Message can be configured on the following parent nodes as a CDATA element :

  • SystemMonitorNotification / Notification / NotificationCommand
  • SystemMonitorNotification / Notification / NotificationInterface

The Message can contain: 

  • fixed values
  • variables

Example: <![CDATA[ scheduler id = ${MON_N_SCHEDULER_ID}  ]]>

Variables

All variables (except OS environment variables) must be defined by using of the

${<variable name>}

syntax.

Note:

  • Syntax for the JobScheduler version 1.10.6 and higher. Syntax for the JobScheduler version 1.10.4, 1.10.5 (see below) is still supported.
  • Syntax for the JobScheduler version 1.10.4, 1.10.5:  {<variable name>}
  • Syntax for the JobScheduler previous versions: %<variable name>%

The order of the substitution the variables values is:

  1. Table variables.
  2. Service variables.
  3. OS environment variables. 
Table variables

 Table of the history of steps of processed orders.

NameDescription 
${MON_N_ID}
Unique notification id
${MON_N_SCHEDULER_ID} Id of the JobScheduler
${MON_N_TASK_ID}Id of the JobScheduler task 
${MON_N_STEP} Consecutive number of the order step
${MON_N_ORDER_HISTORY_ID} Id of the JobScheduler order 
${MON_N_JOB_CHAIN_NAME} Name of the job chain of the order 
${MON_N_JOB_CHAIN_TITLE}Title of the job chain of the order  
${MON_N_ORDER_ID} Unique (within the job chain) id of the order 
${MON_N_ORDER_TITLE} Title of the order 
${MON_N_ORDER_START_TIME} Timestamp of the start of the order
${MON_N_ORDER_END_TIME} Timestamp of the end of the order
${MON_N_ORDER_TIME_ELAPSED} The time or difference in seconds between a beginning time and an ending time of the order
${MON_N_ORDER_STEP_STATE} State of the order inside the job chain
${MON_N_ORDER_STEP_START_TIME}Timestamp of the start of the order step 
${MON_N_ORDER_STEP_END_TIME} Timestamp of the end of the order step 
${MON_N_ORDER_STEP_TIME_ELAPSED}The time or difference in seconds between a beginning time and an ending time of the order step 
${MON_N_JOB_NAME}Name of the job 
${MON_N_JOB_TITLE}Title of the job
${MON_N_TASK_START_TIME}Timestamp of the job task start 
${MON_N_TASK_END_TIME} Timestamp of the job task end
${MON_N_TASK_TIME_ELAPSED} The time or difference in seconds between a beginning time and an ending time of the job task
${MON_N_RECOVERED} 

0 = dependent of the ${MON_N_ERROR} - ok or error was not recovered,

1 = error was recovered  

${MON_N_RETURN_CODE}Return code number 
${MON_N_ERROR}

0 = ok

1 = error 

${MON_N_ERROR_CODE} Exception-code of the job error 
${MON_N_ERROR_TEXT}Exception message of the job (that processed the order) 
${MON_N_CREATED} Timestamp of the notification initial record 
${MON_N_MODIFIED}Timestamp of the latest changes to this notification record 

 

Example
 scheduler id = ${MON_N_SCHEDULER_ID}, history id = ${MON_N_ORDER_HISTORY_ID}, job_chain = ${MON_N_JOB_CHAIN_NAME}(${MON_N_ORDER_ID}), error = ${MON_N_ERROR_TEXT}

 Table of the history of notifications sent to a system monitor.

NameDescription
${MON_SN_ID}Unique system notification id 
${MON_SN_NOTIFICATION_ID}

Reference to the SCHEDULER_MON_NOTIFICATIONS.ID  table

${MON_SN_CHECK_ID}

Reference to the SCHEDULER_MON_CHECKS.ID   table

${MON_SN_SYSTEM_ID} 

Reference to the element attribute

SystemMonitorNotification / @system_id

defined in the XML configuration file

${MON_SN_SERVICE_NAME}

Reference to one of both element attributes

  • SystemMonitorNotification / Notification / NotificationMonitor / @service_name_on_error
  • SystemMonitorNotification / Notification / NotificationMonitor / @service_name_on_success

defined in the XML configuration file 

${MON_SN_OBJECT_TYPE}

NotificationObject type

  0 = JobChain

  1 = Job

100 = dummy code for interlal using 

${MON_SN_RETURN_CODE_FROM} 

Reference to the element attribute

SystemMonitorNotification / Notification / NotificationObjects / JobChain / @return_code_from

defined in the XML configuration file

${MON_SN_RETURN_CODE_TO}

Reference to the element attribute

SystemMonitorNotification / Notification / NotificationObjects / JobChain / @return_code_to

defined in the XML configuration file 

${MON_SN_STEP_FROM}

Reference to the element attribute

SystemMonitorNotification / Notification / NotificationObjects / JobChain / @step_from

defined in the XML configuration file 

${MON_SN_STEP_TO}

Reference to the element attribute

SystemMonitorNotification / Notification / NotificationObjects / JobChain / @step_to

defined in the XML configuration file 

${MON_SN_STEP_FROM_START_TIME}Timestamp for the start of the order step 
${MON_SN_STEP_TO_END_TIME}Timestamp for the end of the order step  
${MON_SN_STEP_TIME_ELAPSED} The elapsed time or the difference in seconds between the start and end times of the order step  
${MON_SN_NOTIFICATIONS}

Reference to element attribute

SystemMonitorNotification / Notification / NotificationObjects / JobChain / @notifications

defined in the XML configuration file  

${MON_SN_CURRENT_NOTIFICATION}Number of notifications that already sent to a System Monitor
${MON_SN_MAX_NOTIFICATIONS}

0 = notifications counter was not reached

1 = notifications counter was reached, all configured notifications were sent

${MON_SN_ACKNOWLEDGED}

0 = not acknowledged

1 = acknowledged 

${MON_SN_RECOVERED}

0 = recovery not sent

1 = recovery sent

${MON_SN_SUCCESS}

0 = Notification onError, SystemMonitorNotification / Notification / NotificationMonitor / @service_name_on_error

1 = Notification onSuccess, SystemMonitorNotification / Notification / NotificationMonitor / @service_name_on_success

${MON_SN_CREATED}Timestamp of the initial system notification record  
${MON_SN_MODIFIED}Timestamp of the latest changes to this system notification record  
Example
 step from = ${MON_SN_STEP_FROM}, step to = ${MON_SN_STEP_TO}, notification = ${MON_SN_CURRENT_NOTIFICATION} (of ${MON_SN_NOTIFICATIONS})

 Table of the history of executed checks (Timer)

NameDescription
${MON_C_ID}Unique check id  
${MON_C_NOTIFICATION_ID}

Reference to table SCHEDULER_MON_NOTIFICATIONS.ID   

${MON_C_NAME}

Reference to element attribute

SystemMonitorNotification / Timer / @name

defined in the XML configuration file 

${MON_C_STEP_FROM} 

Reference to element attribute

SystemMonitorNotification / Timer / TimerJobChain / @step_from

defined in the XML configuration file

 
${MON_C_STEP_TO}

Reference to element attribute

SystemMonitorNotification / Timer / TimerJobChain / @step_to

defined in the XML configuration file

${MON_C_STEP_FROM_START_TIME} Timestamp of the start of the order step  
${MON_C_STEP_TO_END_TIME} Timestamp of the end of the order step  
${MON_C_STEP_TIME_ELAPSED} The time or difference in seconds between a beginning time and an ending time of the order step   
${MON_C_CHECK_TEXT} Message of the check  
${MON_C_CREATED}Timestamp of the check initial record  
${MON_C_MODIFIED}Timestamp of the latest changes to this check record 
Example
 timer name = ${MON_C_NAME}, text = ${MON_C_CHECK_TEXT}
Service variables
NameDescription
${SERVICE_NAME}

Current service name. One of both element attributes:

  • SystemMonitorNotification / Notification / NotificationMonitor / @service_name_on_error
  • SystemMonitorNotification / Notification / NotificationMonitor / @service_name_on_success
${SERVICE_STATUS}

Current service status. One of both element attributes or default: 

  • SystemMonitorNotification / Notification / NotificationMonitor / @service_status_on_error
  • SystemMonitorNotification / Notification / NotificationMonitor / @service_status_on_success
  • default CRITICAL error
  • default OK       success
${SERVICE_MESSAGE_PREFIX}

Message prefix

  • ERROR       error
  • RECOVERED     error recovery
  • TIMER       performance check
Example
 service name = ${SERVICE_NAME}

 

OS environment variables 

 

All existing OS environment variables can be defined by message using the syntax %<variable name>% (Windows) or $<variable name> (Unix).

Example Windows
 %TEMP%/test.exe

 

Examples 
Message on error
scheduler id=${MON_N_SCHEDULER_ID}, history id=${MON_N_ORDER_HISTORY_ID}, job_chain=${MON_N_JOB_CHAIN_NAME}(${MON_N_ORDER_ID}), step=${MON_N_ORDER_STEP_STATE}, error=${MON_N_ERROR_TEXT}            
Message on success
scheduler id=${MON_N_SCHEDULER_ID}, history id=${MON_N_ORDER_HISTORY_ID}, job_chain=${MON_N_JOB_CHAIN_NAME}(${MON_N_ORDER_ID}), steps(${MON_SN_STEP_FROM} to ${MON_SN_STEP_TO}), order time elapsed = ${MON_N_ORDER_TIME_ELAPSED}s            
Message on timer
name = ${MON_C_NAME}, scheduler id=${MON_N_SCHEDULER_ID}, history id=${MON_N_ORDER_HISTORY_ID}, job_chain=${MON_N_JOB_CHAIN_NAME}(${MON_N_ORDER_ID}), steps(${MON_C_STEP_FROM} to ${MON_C_STEP_TO}), check = ${MON_C_CHECK_TEXT}            

Notification environment variables

The default SystemNotifierProcessBuilderPlugin plugin used by the SystemMonitorNotification / Notification / NotificationCommand element sets the following variables as environment variables:

  1. Table variables
  2. Service variables

These variables can be used when the NotificationCommand calls the notification client - not directly but via a shell script that makes the logical implementation for sending the notification messages.

Table variables

All table variables (see Table variables explanation) are set as environment variables with the prefix:

  • SCHEDULER_MON_TABLE_

e.g.:

  • SCHEDULER_MON_TABLE_MON_N_ID
  • SCHEDULER_MON_TABLE_MON_N_SCHEDULER_ID
  • ...
Service variables
NameDescription

SCHEDULER_MON_SERVICE_NAME

Current service name. One of both element attributes:

  • SystemMonitorNotification / Notification / NotificationMonitor / @service_name_on_error
  • SystemMonitorNotification / Notification / NotificationMonitor / @service_name_on_success

SCHEDULER_MON_SERVICE_STATUS

Current service status. One of both element attributes or default:

  • SystemMonitorNotification / Notification / NotificationMonitor / @service_status_on_error
  • SystemMonitorNotification / Notification / NotificationMonitor / @service_status_on_success
  • default CRITICAL error
  • default OK       success

SCHEDULER_MON_SERVICE_MESSAGE_PREFIX

  • ERROR      error                           
  • RECOVERED    error recovery                            
  • TIMER             performance check 

SCHEDULER_MON_SERVICE_COMMAND

 Content of the SystemMonitorNotification / Notification / NotificationCommand after substitution

  

Sample NotificationCommand Unix. Script file (/tmp/command.sh).
1) configured command in the SystemMonitorNotification_<MonitorSystem>.xml file
<NotificationCommand><![CDATA[/tmp/command.sh]</NotificationCommand>
 
2) content of the /tmp/command.sh file
#! /bin/sh 
# Note: "> /tmp/command_output.txt" is used to simulate the starting of the notification client
#
echo "$SCHEDULER_MON_SERVICE_NAME:$SCHEDULER_MON_SERVICE_STATUS:$SCHEDULER_MON_SERVICE_MESSAGE_PREFIX history id = $SCHEDULER_MON_TABLE_MON_N_ORDER_HISTORY_ID" > /tmp/command_output.txt
 
Sample NotificationCommand Windows. Script file (C:/temp/command.cmd).
1) configured command in the SystemMonitorNotification_<MonitorSystem>.xml file
<NotificationCommand><![CDATA[C:/Temp/command.cmd]</NotificationCommand>
 
2) content of the C:/Temp/command.cmd file
rem Note: "> C:/Temp/command_output.txt" is used to simulate the starting of the notification client
rem
echo %SCHEDULER_MON_SERVICE_NAME%:%SCHEDULER_MON_SERVICE_STATUS%:%SCHEDULER_MON_SERVICE_MESSAGE_PREFIX% history id = %SCHEDULER_MON_TABLE_MON_N_ORDER_HISTORY_ID% > C:/Temp/command_output.txt
 

Examples

Examples op5
NotificationInterface 

The following is an except from an XML file used to notify a specific System Monitor (op5 Monitor) via the NotificationInterface:

SystemMonitorNotification_op5.xml
 ...
<!--
monitor_host            The hostname or ip address of System Monitor host 
monitor_port            The TCP port that the System Monitor would listen to
monitor_encryption      Encryption algorithm
service_host            The host that executes the passive check. The name must match the corresponding setting in the System Monitor
{MON_N_SCHEDULER_ID}    See explanation "Table variables"
...
-->
<NotificationInterface monitor_host="monitor_host" monitor_port="5667" monitor_encryption="XOR" service_host="service_host"><![CDATA[
scheduler id=${MON_N_SCHEDULER_ID}, history id=${MON_N_ORDER_HISTORY_ID}, job_chain=${MON_N_JOB_CHAIN_NAME}(${MON_N_ORDER_ID}), step =${MON_N_ORDER_STEP_STATE}, error=${MON_N_ERROR_TEXT}
]]></NotificationInterface>
...
NotificationCommand

The following is an except from an XML file used to notifying a specific System Monitor (op5 Monitor) via the NotificationCommand on Windows:

SystemMonitorNotification_OP5.xml
... 
<!--
service_host               The host that executes the passive check. The name must match the corresponding setting in the System Monitor.
monitor_host               The hostname or ip address of System Monitor host.
{SERVICE_NAME}             See explanation "Service variables"
{SERVICE_STATUS}           See explanation "Service variables"
{SERVICE_MESSAGE_PREFIX}   See explanation "Service variables"
{MON_N_SCHEDULER_ID}       See explanation "Table variables"
...
NotificationCommand after substitution (error case):
<![CDATA[echo service_host:JobScheduler Monitoring Errors:2:ERROR scheduler id=scheduler_4444, history id=123, job_chain=test/my_jobchain(order_id), step=100, error=error occurred | D:\nsca\send_nsca.exe -H monitor_host -c D:\nsca\send_nsca.cfg -d : ]]>
 
NotificationCommand after substitution (recovery case): 
<![CDATA[echo service_host:JobScheduler Monitoring Errors:0:RECOVERED scheduler id=scheduler_4444, history id=123, job_chain=test/my_jobchain(order_id), step=100, error=error occurred | D:\nsca\send_nsca.exe -H monitor_host -c D:\nsca\send_nsca.cfg -d : ]]> 
 
NotificationCommand after substitution (success case):  
<![CDATA[echo service_host:JobScheduler Monitoring Success:0:scheduler id=scheduler_4444, history id=123, job_chain=test/my_jobchain(order_id), step=100, error= | D:\nsca\send_nsca.exe -H monitor_host -c D:\nsca\send_nsca.cfg -d : ]]>  
 
-->
<NotificationMonitor service_name_on_error="JobScheduler Monitoring Errors" service_name_on_success="JobScheduler Monitoring Success">
  <NotificationCommand><![CDATA[echo service_host:${SERVICE_NAME}:${SERVICE_STATUS}:${SERVICE_MESSAGE_PREFIX}scheduler id=${MON_N_SCHEDULER_ID}, history id=${MON_N_ORDER_HISTORY_ID}, job_chain=${MON_N_JOB_CHAIN_NAME}(${MON_N_ORDER_ID}), step=${MON_N_ORDER_STEP_STATE}, error=${MON_N_ERROR_TEXT} | D:\nsca\send_nsca.exe -H monitor_host -c D:\nsca\send_nsca.cfg -d : ]]>
  </NotificationCommand>  
</NotificationMonitor>

...
Examples Zabbix
NotificationCommand

The following is an except from an XML file used to notify a specific System Monitor (Zabbix Monitor) and using NotificationCommand

SystemMonitorNotification_zabbix.xml
... 
<!--
zabbix_sender            Zabbix sender installed on the JobScheduler host
localhost                Hostname of the zabbix server
Zabbix_server            JobScheduler Agent name(host name) that registred on Zabbix
samples.job1             Item key of zabbix (replace "/" to "." of JOB_NAME
${MON_N_ERROR_TEXT}      See explanation "Table variables"
-->
<NotificationCommand>
<![CDATA[zabbix_sender -z localhost -s zabbix_server -k samples.job1 -o ${MON_N_ERROR_TEXT}]]>
</NotificationCommand>
...

JobScheduler - Store parameters to database

The Monitoring Interface provide functionality to store the job/order parameters of the specific jobs into database (table SCHEDULER_MON_RESULTS).

See explanation : Calculation

JobScheduler - Job Chains

The following job chains are provided and should be configured accordingly:

sos / notification / CheckHistory (JobScheduler version 1.9.x, 1.10.x)

See <scheduler_install>/jobs/JobSchedulerNotificationCheckHistoryJob.xml

  • This is the main job that analyze the JobScheduler history tables und write results into the notification tables.
    • Job read all history entries for the job chains, configured in the SystemMonitorNotification XML files.
    • Job execute the performance checks for the defined Timers
  • Order Check
    • configure repeat interval for order run time, e.g. every two minutes.

sos / notification / CheckHistory (JobScheduler version 1.11.x)

  • Job chain removed
  • Set param sos.use_notification true (config/scheduler.xml)

JobScheduler version 1.11.x config/scheduler.xml
...
<spooler>
	<config ...>
		<params>
			...
			<param name="sos.use_notification" value="true"/>
		...
</spooler>

 

sos / notification / SystemNotifier

See <scheduler_install>/jobs/JobSchedulerNotificationSystemNotifierJob.xml

  • Sends notifications to a specific System Monitor.
  • Order MonitorSystem
    • JobScheduler version 1.9.x, 1.10.x
      • configures a repeat interval for the order run time that is not less than the interval that has been chosen for triggering the job chain sos/notification/CheckHistory

sos / notification / CleanupNotifications

See <scheduler_install>/jobs/JobSchedulerNotificationCleanupNotificationsJob.xml

  • Removes notifications that have expired. 
  • Order Cleanup
    • configure start time for order run time, e.g. 24:00

sos / notification / ResetNotifications

See <scheduler_install>/jobs/JobSchedulerNotificationResetNotificationsJob.xml

  • Some System Monitors may provide an "acknowledge" operation, that signaling has known problem.
  • Should an "acknowledge" operation have been performed for a specific service in the System Monitor then job chain ResetNotifications would stop JobScheduler from sending notifications for that service for errors that have already occurred.
  • Do not configure the order run time for this job chain, as job chain will be triggered by the System Monitor's "acknowledge" operation via add_order XML command.
Examples
Example ResetNotifications <add_order> XML command

The following example shows the XML command sent from a monitoring system to the JobScheduler to call the sos/notification/ResetNotifications job chain and set the relevant service name as acknowledged.

ResetNotifications OP5 add_order
<add_order  job_chain   ="sos/notification/ResetNotifications"
            id          ="op5 JobScheduler Monitoring Error acknowledgement"
            title       ="op5 JobScheduler Monitoring Error acknowledgement">
    <params>
        <param name="service_name"  value="JobScheduler Monitoring Error" />
        <param name="system_id"     value="op5"/>
        <param name="operation"     value="acknowledge" />
     </params>
</add_order>

Key to the above code:

ElementAttributeValue Description
add_order   XML Command to add the new order to the specified job chain on the JobScheduler.
 job_chainsos/notification/ResetNotifications Job chain path must correspond with the path of the ResetNotifications job chain installed on the JobScheduler.
 id  Order identifier.
 title  Order title.
param   3 following parameters must be set:
 nameservice_name
JobScheduler Monitoring Error
Relevant service name to set all already occured service errors in JobScheduler Interface Monitor as acknowledged.
 namesystem_idop5

System identification.

Corresponds with SystemMonitorNotification/@system_id setting in the  SystemMonitorNotification_<MonitorSystem>.xml configuration file.

 nameoperationacknowledgeFixed value. Operation name to execute the acknowledgement in the JobScheduler Monitoring Interface.
Example ResetNotifications <add_order> XML command via Perl script for op5 monitor system

This example shows the integration of a Perl script into op5 monitor system that automatically sends the above XML command to the JobScheduler sos/notification/ResetNotifications job chain.

The "Acknowledgment" on the op5 Monitor side works as follows:

  1. Contact "acknowledgment" + Event Handler:
    • it first of all requires a contact, that receives the Notifications in the same way as the other contacts. However, an event notification for this contact is not received via Mail but an Event Handler, i.e.  an XML command will be executed instead of a mail being received. (Please see the next point, Notification Command.)
  2. The "svc_notify_ack_handle" Notification Command:
    • this command will always be executed for the services that are specified for the contact. This command is executed when the service status changes (for example, by a change from OK to Critical or Acknowledgment of an Error).
    • The command executes a check_acknowledge.pl script.
  3. The check_acknowledge.pl Script (see the example below): this script is executed by the command and first of all checks whether the command is a response to an Acknowledgment:
    • If the command is not a response to an Acknowledgment: then nothing happens
    • If the command is a response to an Acknowledgment      : then the script causes the JobScheduler to be contacted and sent am XML query, that instructs the JobScheduler to start a specific job chain (the sos/notification/ResetNotifications chain)

check_acknowledge.pl
#!/usr/bin/perl -w

use strict;
use LWP::UserAgent;
use HTTP::Request::Common;
use Getopt::Long;
use vars qw($opt_H $opt_f $opt_s $opt_p $opt_t $opt_h);
use vars qw(%ERRORS &support);
my $host;
my $type;
my $service;
my $port;
my $timeout = 30;
our %ERROR;
%ERRORS =   (
            'OK'       => 0,
            'CRITICAL' => 2,
            'ERROR'    => 2,
            'UNKNOWN'  => 9,
              'WARNING'  => 1,
            );


sub print_help ();
sub print_usage ();

Getopt::Long::Configure('bundling');
GetOptions
   ("h"   => \$opt_h, "help"        => \$opt_h,
    "H=s" => \$opt_H, "hostname=s"  => \$opt_H,
    "f=s" => \$opt_f, 
    "s=s" => \$opt_s, "service=s"   => \$opt_s,
    "t=i" => \$opt_t, "timeout=i"   => \$opt_t,
    "p=i" => \$opt_p, "port=i"      => \$opt_p);

if($opt_h) {print_help(); exit 0;}

if($opt_H ) {
    if ( $opt_H =~ /([-.A-Za-z0-9]+)/ ) { $host = $opt_H;    }
    ($host) || print("Invalid host: $opt_H\n");
}
else{ print("Host name/address not specified\n");} 

if($opt_p ) {
    if ($opt_p =~ /([0-9]+)/) {    $port = $1 if ($opt_p =~ /([0-9]+)/);}
    ($port < 0 || $port > 65535) && print("Invalid Port: $opt_p\n");
}
else{ print("Port not specified\n");}

if ($opt_t) { $timeout = $opt_t; }

if( !$host || !$port ) { print_usage(); exit 1;}

#<add_order  job_chain   ="/sos/notification/ResetNotifications"
#            id          ="op5 JobScheduler Monitoring Error acknowledgement"
#            title       ="op5 JobScheduler Monitoring Error acknowledgement">
#    <params>
#        <param name="service_name"  value="JobScheduler Monitoring Error" />
#        <param name="system_id"     value="op5"/>
#        <param name="operation"     value="acknowledge" />
#     </params>
#</add_order>
my $message = "<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?><add_order job_chain=\"/sos/notification/ResetNotifications\" id=\"op5 ".$opt_s." Acknowledegment\" title=\"op5 ".$opt_s." Acknwoledgement\"><params><param name=\"system_id\" value=\"MonitorSystem\"/><param name=\"service_name\" value=\"".$opt_s."\"/><param name=\"operation\" value=\"acknowledge\"/></params></add_order>";

if($opt_f=~m/ACKNOWLEDGEMENT/){
    send_request($message);
}
else{ print("Please set notification type to ACKNOWLEDGEMENT\n");}

sub send_request {
    my $message = shift;
  
    my $userAgent = LWP::UserAgent->new(agent => 'perl post');
    $userAgent->timeout($timeout);
  
    my $response = $userAgent->request(POST 'http://'.$host.':'.$port,Content_Type => 'text/xml',Content => $message);
    if ($response->is_success) {
        _report('OK', "OK: Service name: ".$opt_s."\nNotification type: ".$opt_f."\nRequest: ". $message."\n\nAnswer:\n".$response->as_string."\n"); 
    } 
    else {
        _report('ERROR',"ERROR: Service name: ".$opt_s."\nNotification type: ".$opt_f."\nRequest: ". $message."\n\nAnswer:\n".$response->error_as_HTML."\n");
    }
}

sub get_attribute_value {
    my ($attr_name, $elem_xml) = @_;
    $elem_xml =~ s/.*$attr_name\s*=\s*\"(.*?)\".*/$1/s;
    return $elem_xml;
}

sub get_state_elem {
    my $xml = shift;
    $xml =~ s/.*<spooler.*?>\s*<answer.*?>\s*(<state.*?>).*/$1/s;
    return $xml;
}       

sub print_help () {
   print $0. "\n";
   print "Copyright (c) 2015 SOS GmbH, info\@sos-berlin.com

This script tries to connect to given Job Scheduler

";
   print_usage();
   print "
-H, --hostname=HOST
   Name or IP address of host to check
-p, --port=INTEGER
   Port at host to check
-t, --timeout=INTEGER
   Timeout for HTTP connetion
-f =STRING
   Notification type, e.g. ACKNOWLEDGEMENT
-s, --service=STRING
   Service name, e.g. JobScheduler Errors
-h, --help
   This help
";
}

sub print_usage () {
   print  "Usage: $0 -H <host> -p <port> -f ACKNOWLEDGEMENT -s <service name> [-t <timeout>]\n";
}

sub _report { 
    print $_[1]; 
  if (defined($ERRORS{$_[0]})) { exit $ERRORS{$_[0]}; }
  else { exit 0; }
}

JobScheduler - Job Chains customization

The default name of the monitor system used in the configuration files and stored in the JobScheduler database is "MonitorSystem".

The default configuration can be changed to allow better customization of the monitoring systems used.

Example customization for the op5 system monitor:

  • <scheduler_install>/config/notification/SystemMonitorNotification_MonitorSystem.xml
    • rename this file to  SystemMonitorNotification_op5.xml
    • set system_id attribute to op5
      • e.g. <SystemMonitorNotification system_id="op5">
  • <scheduler_install>/config/live/sos/notification/SystemNotifier,MonitorSystem.order.xml
    • rename this file to SystemNotifier,op5.order.xml
    • set system_configuration_file attribute to SystemMonitorNotification_op5.xml
      • e.g. <param name="system_configuration_file" value="config/notification/SystemMonitorNotification_op5.xml"/>
  •  <scheduler_install>/config/live/sos/notification/ResetNotifications,AcknowledgeMonitorSystem.order.xml
    • rename this file to ResetNotifications,Acknowledgeop5.order.xml
    • set system_id attribute to op5
      • e.g. <param name="system_id" value="op5"/>

JobScheduler - Cluster

See Cluster Operation

In case of Cluster Operation please modify the job_chain element definition for all notification job chain files

  • add distributed="yes" attribute.
    • e.g: <job_chain distributed="yes" ...
  • remove orders_recoverable="no" attribute if exists

Following job chain files must be modified in the notification directory <scheduler_install>/config/live/sos/notification/:

  • CheckHistory.job_chain.xml
  • CleanupNotifications.job_chain.xml
  • ResetNotifications.job_chain.xml
  • SystemNotifier.job_chain.xml

Use Cases

Workflow Execution takes too long

Initial Situation

A Job Chain is triggered and it could not end, it hang in a step, taking longer than expected.

Problem

Execution time was too long

Handling

A timer for this Job Chain has been set and the System Monitor notified about it. The expiration times for the Job Chains are configured with enough time for processing. This is usually used for cases where the Job Chain could hang in a specific step.

Configuration

SFTP connection refused

Initial Situation

Consider a Job Chain that uses SFTP for transferring files. You have a setback configured in this step of the Job Chain, so that if the connection to the SFTP server fails, this step is retried after a specified time.

Problem

The SFTP server is not available anymore.

Handling

The System Monitor will be notified to the service related to the Job Chain with the message error. However, you don't want to have repeated notifications for a Job Chain when is an external factor, the connection to the SFTP Server, is producing the error.

Configuration

Thresholds

Initial Situation

Consider the situation where a workflow has to be executed successfully a specific number of times before a specific point in time. This means that a specific value has to be monitored in order to determine if this quote was reached.

Handling

A new History service is configured, so that the workflow executions (Job Chains in the JobScheduler vocabulary) send the information that they have been successfully executed to the System Monitor.

Configuration

Acknowledgment

Initial Situation

An alert for a Service has been sent to the System Monitor, which has sent a Mail to the Service Desk (Support Team) notifying them about the alert.

Handling

The problem is known to the Service Desk and they "acknowledge" the problem. The acknowledgment will cause the JobScheduler to be notified not to send any more notifications for this Service to the System Monitor until the Service has been recovered.

Configuration

 

Recoverable Errors

Initial Situation

You have a setback configured in one of the steps of the Job Chain, so that if the step execution fails, this step is retried after a specified time.

Problem

The step has ended with an error, but recovered after setback

Handling

If the error message has been sent to the System Monitor, in case of error recovery JobScheduler will automatically sent the recovery message on the same service with the same error message and the prefix RECOVERED.

Configuration

Change Management References

T Key Linked Issues Fix Version/s Status P Summary Updated
Loading...
Refresh

 

 

 

 

  • No labels