You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 116 Next »

Introduction

This article gives an overview of the JobScheduler Monitoring Interface. This solution provides an efficient means of of monitoring JobScheduler objects such as Jobs, Job Chains and Orders and forwarding notifications to system Monitors such as op5, Nagios or icing. This solution is available with JobScheduler General Availability Release 1.8 onwards.

 

The most important features of this solution are:

  • JobScheduler: carries out a two step process around the interface:
    • Detecting errors: A Job running at regular intervals - typically every 2 minutes - analyses the History Log information recorded by the JobScheduler in the database and notes a predefined set of information about the Job Scheduler objects being monitored. The information noted is typically whether tasks have been completed and whether errors or warnings have been logged. This job then writes this information in a separate Notifications database table.
    • Sending alerts: A second Job is responsible for sending the alerts to the relevant System Monitor. This job is also run at regular intervals, analyzing the Notifications database tables. It then carries out a predefined action for each item it finds in the table. Typical actions would be informing a particular monitor that a particular type of event has occurred, such as the successful completion of an order or a job ending in error.
  • JobScheduler: The solution architecture allows analysis of the Log History of more than one JobScheduler using the database specified. It may also be configured to monitor more that one database.
  • System Monitors: the JobScheduler is able to connect to more than one System Monitor at the same time.

Monitoring Definitions

The following definitions apply for the monitoring systems:

DefinitionDescription
System MonitorA System Monitor is an instrument to inform a Service Desk (e.g. 1st Level Support) about incidents in IT systems. It does not analyze incidents, but merely information about incidents, in order to be able to forward and scale this information.
Passive ChecksPassive checks are sent remotely from an external host (from the point of view of the System Monitor) to the Monitor. Otherwise, checks that are carried out periodically by the System Monitor itself are called active checks.
AlertingAn Alert is a message about an event. An Alert does not provide all the information about an event, but it informs about the existence of the event. An Alert can be either positive or negative.
Notification The notification of a specific Alert. Notification will not be provided for every Alert, just the ones that are so configured will be notified. Notifications are therefore a subset of the Alerts and can also be either positive or negative. 
Acknowledgment
 
Is the confirmation of an alert and it has the meaning that the alert has been seen and/or is known and that appropriate action is being taken. An acknowledgment is always manually executed. This means that there is always someone that has realized there is a Critical service and this person acknowledges the services (usually by the Service Desk or 1st Level Support). It is never an automatized step. 

 

Benefits

The benefits of the new solution are:

  1. No changes have to be made to your existing JobScheduler configuration (Jobs, Job Chains, etc.) in order to get this solution working. You add the Job Chains required for the monitoring but do not have to modify your current ones.
  2. The whole architecture lies on the JobScheduler side and the solution is therefore independent of the monitor that the Alerts are sent to. The solution works for every monitor that can receive passive checks.
  3. Processing of Jobs and Job Chains in JobScheduler is not affected or modified by the monitoring, neither from the point of view of performance nor that of stability.
  4. This solution makes very detailed information available for the System Monitors. JobScheduler logs very exactly and this information can be sent as a Passive Check to the relevant Monitoring Service if required.
  5. Errors of a critical nature are immediately recognized in the System Monitor. The JobScheduler has initially access to all the log information and can be configured to filter this information very exactly before forwarding it to the relevant System Monitor Service. Through this feature, the Service Desk is immediately able to set priorities when, for example, recovering errors. It is unlikely that a performance error would be given the same priority as an error in document processing. This feature is illustrated in the following diagram:

Functionality

FunctionalityDescription
Job Chain and Order MonitoringThis solution allows Job Chains in JobScheduler to be monitored by way of the Orders that trigger these Job Chains.
History NotificationsNot only can critical alerts be monitored, but also positive ones. The history of a specific service can be monitored to see exactly if a specific work-flow has been executed and what result it gave.
Performance measurement (Timer) Timers can be used to measure the performance of Job Chains. These can be used to send a warning alert to a System Monitor if a Job Chain takes more that a predefined time to complete.
AcknowledgmentAcknowledgments sent in response to critical alerts sent out by a System Monitor can be used to add Orders to the JobScheduler, so that the JobScheduler does not send more notifications about a service to the System Monitor.

Monitoring sample - op5 Monitor

Here is an example of JobScheduler monitoring in op5 Monitor. There are 3 checks (in op5 Monitor they are called services) defined for the JobScheduler monitoring. Different Job Chains in JobScheduler can send notifications to the same check, so that it is not necessary to create one check for each Job Chain, because that could produce a chaotic monitoring. Instead, we group results in three categories: 

  • JobScheduler Monitoring Errors: Job Chains that end up with an error are sent to this service. The last error notification is shown in the column "STATUS INFORMATION".
  • JobScheduler Monitoring Success: Job Chains that end up with success, that means, also positive notifications are sent to the monitoring system. Concretely, the history of a specific Job Chain is also monitored, to see whether a specific workflow was executed or not. The last success notification is shown in the column "STATUS INFORMATION".
  • JobScheduler Monitoring PerformanceThere are also timers that measure the performance of a Job Chain. In case it takes too long for a Job Chain to end, a warning alert will be sent to a System Monitor. The information about the expired timer is shown in the column "STATUS INFORMATION".

op5 Monitor - Services for JobScheduler monitoring

Installation

See JobScheduler Monitoring Interface - Prerequisites and Installation

Configuration

JobScheduler - SystemMonitorNotification files

Location: <scheduler_install>/config/notification

FileDescription
SystemMonitorNotification_v1.0.xsd

XML Schema file that defines which values are allowed in your XML files for the JobScheduler monitoring.

That means, you just have to modify your SystemMonitorNotification_<MonitorSystem>.xml files in order to configure the JobScheduler objects you want to monitor and which System Monitor you want to use for that goal, but the XML schemas do not have to be modified.

SystemMonitorNotification_<MonitorSystem>.xml

 Configuration file for each System Monitor.

  • Specifies the delivery way to System Monitor.
  • Specifies notification for error or success conditions
  • Specifies notification to measure performance of JobScheduler objects
 

SystemMonitorNotificationTimers.xml

Configuration file for all System Monitors.

  • Specifies notification to measure performance of JobScheduler objects

This file is optional and just has to contain the definitions of the SystemMonitorNotification / Timer elements.

 

SystemMonitorNotification Elements

The configuration element descriptions are organized into the following major categories:

ElementElement descriptionDescription
SystemMonitorNotificationTop Level ElementConfiguration for notifications to a system monitor
NotificationOnce or more inside a SystemMonitorNotification elementSpecifies a system monitor notification that includes a command line invocation and the JobScheduler objects
TimerOptional, once or more inside a SystemMonitorNotification elementPerformance measurement definition
SystemMonitorNotification

SystemMonitorNotification support the following attributes:

Note:

  • attribute system_id 
    • in case of the SystemMonitorNotificationTimers.xml the value of this attribute is not important and can have any value.

      • e.g.: timers

AttributeUsageDescription
system_idrequired

System Monitor identifier.

See System Monitor personalization 

Example
<SystemMonitorNotification system_id="OP5">
...


SystemMonitorNotification / Notification

The following elements may be nested inside a Notification element:

ElementElement descriptionDescription
NotificationMonitorOnce inside a Notification elementSpecifies the System Monitor interface that is being used for messages: either by a Plugin Interface or by command line invocation
NotificationObjectsOnce inside a Notification elementSpecifies the JobChains and the Timers definitions
SystemMonitorNotification / Notification / NotificationMonitor

NotificationMonitor support the following attributes:

Note:

  • attributes service_name_on_error and service_name_on_success
    • at least one of these attributes must be configured
    • both attributes can be configured together
AttributeUsageDescription
service_name_on_errorOptionalThis setting specifies the service that is configured in the Service Monitor for messages of job runs with errors and for job recovery messages. The service name must match the corresponding setting in the System Monitor.
service_name_on_successOptionalThis setting specifies the service that is configured in the Service Monitor for receiving informational messages on successful job runs. The service name must match the corresponding setting in the System Monitor
service_status_on_errorOptional

This setting specifies the service status code for error messages.

Default: CRITICAL

service_status_on_successOptional

This setting specifies the service status code for success messages

Default: OK

Example
<!-- Example 
OP5 NSCA Status: 
0 - OK 
1 - WARNING 
2 - CRITICAL 
3 - UNKNOWN --> 
... 
<!-- Sending occurred errors as CRITICAL (default) --> 
<NotificationMonitor service_name_on_error="JobScheduler Monitoring Errors"> 
... 
<!-- Sending occurred errors as WARNING --> 
<NotificationMonitor service_name_on_error="JobScheduler Monitoring Errors" service_status_on_error="1"> 
...

One of the following elements must be nested inside a NotificationMonitor element:

ElementElement descriptionDescription
NotificationInterfaceOptional, once inside of NotificationMonitor elementPlugin Interface to be executed for System Monitor notification
NotificationCommandOptional, once inside of NotificationMonitor elementCommand line to be executed for System Monitor notification

 

SystemMonitorNotification / Notification / NotificationMonitor / NotificationInterface

NotificationInterface support the following attributes:

AttributeUsageDescription
monitor_hostRequiredThis setting specifies the hostname or ip address of System Monitor host.
monitor_portRequiredThis setting specifies the TCP port that the System Monitor would listen to.
monitor_passwordOptional

This setting specifies the password configured in the ncsa.cfg file used by NSCA.

monitor_connection_timeoutOptional

This setting specifies the connection timeout in ms.

Default: 5000

monitor_response_timeoutOptionalThis setting specifies the NSCA response timeout in ms.
monitor_encryptionOptional

This setting specifies that the communication with the System Monitor is encrypted. By default no encryption is used.

  • NONE               - no encryption
  • XOR             - XOR encryption
  • TRIPLE_DES - use of triple des algorithm for encryption
service_hostRequiredThis setting specifies the name of the host that executes the passive check. The name must match the corresponding setting in the System Monitor.
pluginOptionalDefault: com.sos.scheduler.notification.plugins.notifier.SystemNotifierSendNscaPlugin
Example
...
<NotificationInterface monitor_host="monitor_host" monitor_port="5667" monitor_encryption="XOR" service_host="service_host"><![CDATA[
scheduler id=%MON_N_SCHEDULER_ID%, history id=%MON_N_ORDER_HISTORY_ID%, job_chain=%MON_N_JOB_CHAIN_NAME%(%MON_N_ORDER_ID%), step =%MON_N_ORDER_STEP_STATE%, error=%MON_N_ERROR_TEXT%
]]></NotificationInterface>
...
SystemMonitorNotification / Notification / NotificationMonitor / NotificationCommand

NotificationCommand support the following attributes:

AttributeUsageDescription
pluginOptionalDefault: com.sos.scheduler.notification.plugins.notifier.SystemNotifierProcessBuilderPlugin
Example
...
<NotificationCommand><![CDATA[
echo scheduler id=%MON_N_SCHEDULER_ID%, history id=%MON_N_ORDER_HISTORY_ID%, job_chain=%MON_N_JOB_CHAIN_NAME%(%MON_N_ORDER_ID%), step =%MON_N_ORDER_STEP_STATE%, error=%MON_N_ERROR_TEXT% > D://errors.txt
]]></NotificationCommand>
...
SystemMonitorNotification / Notification / NotificationObjects

One of the following elements must be nested inside a NotificationObjects element:

ElementElement descriptionDescription
JobChainOptional, once or more inside of NotificationObjects elementRestricts notifications for job chains
TimerOptional, once or more inside of NotificationObjects elementRestricts notifications for performance checks (Timer)
Example
<SystemMonitorNotification system_id="OP5"> 
  <Notification> 
    <NotificationMonitor service_name_on_error="JobScheduler Monitoring Errors"> 
      ... 
    </NotificationMonitor> 
    <NotificationObjects> 
      <!-- Send the job chain error, occurrent in the "test/my_jobchain" job chain, to the "JobScheduler Monitoring Errors" service. --> 
      <JobChain name="test/my_jobchain" /> 
    </NotificationObjects> 
 </Notification> 
</SystemMonitorNotification>  

 

SystemMonitorNotification / Notification / NotificationObjects / JobChain

JobChain support the following attributes:

AttributeUsageDescription
notifications

Optional

Integer

Specifies the number of notifications that are sent to a System Monitor.

Default: 1

scheduler_idOptional

Notifications are restricted to the JobScheduler instance with the given identification. By default notifications will be sent for all JobScheduler instances that would log into the same database.

Regular expression can be used.

nameOptional

Job chain name including possible folder names.

Regular expression can be used.

step_fromOptionalRestricts notifications for job chains to a sequence of job nodes that are specified with the step_from and step_to attributes.
step_toOptionalRestricts notifications for job chains to a sequence of job nodes that are specified with the step_from and step_to attributes.
excluded_stepsOptionalSpecifies the steps which will be excluded from the analysing (separated by semicolon)
Example
...
<JobChain notifications="2" name="test/my_jobchain"/>
...
<JobChain scheduler_id="scheduler_4444" />
...
<JobChain scheduler_id="scheduler_4444" name="^(test/my)" />
...
<JobChain name="test/my_jobchain" step_from="200"/>
...
<JobChain name="test/my_jobchain" step_to="500"/>
...
<JobChain name="test/my_jobchain" step_from="300" step_to="300"/>
...
<JobChain name="test/my_jobchain" excluded_steps="200;300"/>
...

 

SystemMonitorNotification / Notification / NotificationObjects / Timer

Timer support the following attributes:

AttributeUsageDescription
notifications

Optional

Integer

Specifies the number of notifications that are sent to a System Monitor.

Default: 1

nameOptionalCorrespondence to Timer name setting defined in the SystemMonitorNotification / Timer element
notify_on_error

Optional

Boolean

Send timer check notification when the configured job chain contains the error notifications.

Default: false

Example
<SystemMonitorNotification system_id="OP5"> 
  <Notification> 
    <NotificationMonitor service_name_on_error="JobScheduler Monitoring Error"> 
      ... 
    </NotificationMonitor> 
    <NotificationObjects> 
     <!-- 
     Send the job chain error, occurrent in the "test/my_jobchain" job chain, to the "JobScheduler Monitoring Errors" service. 
     --> 
     <JobChain name="test/my_jobchain" /> 
    </NotificationObjects> 
  </Notification>   
 
  <Notification> 
    <NotificationMonitor service_name_on_error="JobScheduler Monitoring Performance"> 
      ... 
    </NotificationMonitor> 
    <NotificationObjects> 
      <!-- 
      Send the performance check error, occurrent in the "test/my_jobchain" job chain, to the "JobScheduler Monitoring Performance" service. 
      Send of the performance check error to the "JobScheduler Monitoring Performance" service 
      will be ignored when the "test/my_jobchain" has the job chain error (default notify_on_error = false). 
      --> 
      <Timer name="my_timer" /> 
    </NotificationObjects> 
 </Notification>   
 
 <Timer name="my_timer"> 
    <JobChain name="test/my_jobchain" /> 
 </Timer> 
</SystemMonitorNotification> 

 

SystemMonitorNotification / Timer 

The following elements must be nested inside a Timer element:

ElementElement descriptionDescription
JobChainOnce or more inside of Timer elementRestricts notifications for job chains
MinimumOptional or once inside of Timer elementMinimum required time consumption for job or job chain execution. Allows script code to be executed that returns the minimum execution time required in seconds.
MaximumOptional or once inside of Timer elementMaximum allowed time consumption for job or job chain execution. Allows script code to be executed that returns the maximum execution time required in seconds.
Example
<SystemMonitorNotification system_id="OP5"> 
  ... 
  <Timer name="my_timer_1"> 
    <JobChain name="test/my_jobchain_1" /> 
    <Maximum><Script language="javascript"><![CDATA[1000]]></Script></Maximum> 
  </Timer> 
 
  <Timer name="my_timer_2"> 
    <JobChain name="test/my_jobchain_2" /> 
    <JobChain name="test/my_jobchain_3" /> 
    <Minimum><Script language="javascript"><![CDATA[500]]></Script></Minimum> 
    <Maximum><Script language="javascript"><![CDATA[1000]]></Script></Maximum> 
  </Timer> 
</SystemMonitorNotification> 

Timer support the following attributes:

AttributeUsageDescription
 nameRequired

Correspondence to Timer used in the SystemMonitorNotification / Notification / NotificationObjects / Timer element.

The name must be unique across all timers definitions.

Example
...
<Timer name="my_timer">
... 

 

SystemMonitorNotification / Timer / JobChain

JobChain support the following attributes:

AttributeUsageDescription
 scheduler_idOptional

Notifications are restricted to the JobScheduler instance with the given identification. By default notifications will be sent for all JobScheduler instances that would log into the same database.

Regular expression can be used.

 nameOptional

Job chain name including possible folder names.

Regular expression can be used.

 step_fromOptionalRestricts checks for job chains to a sequence of job nodes that are specified with the step_from and step_to attributes.
 step_toOptionalRestricts checks for job chains to a sequence of job nodes that are specified with the step_from and step_to attributes.
Example
...
<JobChain scheduler_id="scheduler_4444" /> 
... 
<JobChain scheduler_id="scheduler_4444" name="^(test/my)" /> 
... 
<JobChain name="test/my_jobchain" step_from="200"/> 
... 
<JobChain name="test/my_jobchain" step_to="500"/> 
... 
<JobChain name="test/my_jobchain" step_from="300" step_to="300"/>
...

 

SystemMonitorNotification / Timer / Minimum

The following elements must be nested inside a Minimum element:

ElementElement descriptionDescription
ScriptOnce inside of Minimum elementScript code in one of the supported languages
Example
...
<Timer name="my_timer">
  ...
  <Maximum><Script language="javascript"><![CDATA[1000]]></Script></Maximum>
</Timer>
... 

 

SystemMonitorNotification / Timer / Maximum

The following elements must be nested inside a Maximum element:

ElementElement descriptionDescription
ScriptOnce inside of Maximum elementScript code in one of the supported languages
Example
...
<Timer name="my_timer">
  ...
  <Minimum><Script language="javascript"><![CDATA[1000]]></Script></Minimum>
</Timer>
... 

 

SystemMonitorNotification / Timer / Minimum|Maximum / Script

Script support the following attributes:

AttributeUsageDescription
 languageRequired

Script language name

Supported languages:

  • javascript
  • ECMAScript 

 The Script element can contains

  • fixed value
  • calculation based on the job/order parametes
Fixed value

Fixed value is the duration time in seconds for the specific Minimum or Maximum definition

Example (fixed value)
...
  <Script language="javascript"><![CDATA[1000]]></Script>
...
Calculation

The calculation result is the time in seconds for the specific Minimum or Maximum definition.

This example calculate the execution time dependend of the %file_size% parameter, that was set by a specific job (see the example below)´.

Example (calculation)
...
  <Script language="javascript"><![CDATA[                     
    function my_calculate(){                         
      var fileSize              = new java.lang.Double(%file_size%);                         
      var timerExpiryFactor     = 0.0025;                         
      var timerExpiryTolerance  = timerExpiryFactor*0.1;                         
      var timerExpiry           = new java.lang.Double(timerExpiryFactor+timerExpiryTolerance);                         
      timerExpiry               = timerExpiry*fileSize;                     
      return timerExpiry;                     
    }                         
    my_calculate();
  ]]></Script>
...

 

This example job calculate and create a new order parameter file_size.

To store the parameters into database (table SCHEDULER_MON_RESULTS) :

  • set the scheduler_notification_result_parameters parameter (see job documentation jobs/JobSchedulerNotificationStoreResultsJob.xml)
  • set the com.sos.scheduler.notification.jobs.result.StoreResultsJobJSAdapterClass as monitor
Example (job)
<?xml version="1.0" encoding="ISO-8859-1"?> 
<job  title="Sample Job with Store Result Monitor" order="yes" stop_on_error="no" tasks="1">     
  <params>
     <!--
     set the scheduler_notification_result_parameters parameter
     -->         
    <param name="scheduler_notification_result_parameters" value="file_size"/>     
  </params>     
  
  <!--
  calculate and create the new order parameter if necessary
  -->
  <script language="javascript"><![CDATA[             
      function spooler_process(){                                  
        var order    = spooler_task.order;                 
        var params   = spooler.create_variable_set();                 
        params.merge(spooler_task.params);                 
        params.merge(order.params);                      
        
        // parameter scheduler_file_path was set in the previous job chain step
        var file     = new java.io.File(params.value("scheduler_file_path"));                 
        var fileSize = file.length()/1024;                 
        order.params.set_var("file_size",fileSize.toString());                          
      return true;             
      }]]>     
   </script>          
 
   <!-- 
   set the com.sos.scheduler.notification.jobs.result.StoreResultsJobJSAdapterClass as monitor
   -->     
   <monitor  name="notification_monitor" ordering="1">         
     <script java_class="com.sos.scheduler.notification.jobs.result.StoreResultsJobJSAdapterClass" language="java"/>     
   </monitor>

   <run_time /> 
</job> 

Message

Usage

The Message can be configured on the following parent nodes as CDATA element :

  • SystemMonitorNotification / Notification / NotificationCommand
  • SystemMonitorNotification / Notification / NotificationInterface

The Message can contains: 

  • fixed values
  • variables

Example: <![CDATA[ scheduler id = %MON_N_SCHEDULER_ID%  ]]>

Variables

All variables must be defined by using of the %<variable name>% syntax.

The order of the substitution the variables values is:

  1. Table variables.
  2. Service variables.
  3. OS environment variables. 
Table variables 

 Table of the history of steps of processed orders.

NameDescription 
%MON_N_ID%Unique notification id
%MON_N_SCHEDULER_ID% Id of the JobScheduler
%MON_N_TASK_ID%Id of the JobScheduler task 
%MON_N_STEP% Consecutive number of the order step
%MON_N_ORDER_HISTORY_ID% Id of the JobScheduler order 
%MON_N_JOB_CHAIN_NAME% Name of the job chain of the order 
%MON_N_JOB_CHAIN_TITLE%Title of the job chain of the order  
%MON_N_ORDER_ID% Unique (within the job chain) id of the order 
%MON_N_ORDER_TITLE% Title of the order 
%MON_N_ORDER_START_TIME% Timestamp of the start of the order
%MON_N_ORDER_END_TIME% Timestamp of the end of the order
%MON_N_ORDER_TIME_ELAPSED% The time or difference in seconds between a beginning time and an ending time of the order
%MON_N_ORDER_STEP_STATE% State of the order inside the job chain
%MON_N_ORDER_STEP_START_TIME%Timestamp of the start of the order step 
%MON_N_ORDER_STEP_END_TIME% Timestamp of the end of the order step 
%MON_N_ORDER_STEP_TIME_ELAPSED%The time or difference in seconds between a beginning time and an ending time of the order step 
%MON_N_JOB_NAME%Name of the job 
%MON_N_JOB_TITLE% Title of the job
%MON_N_TASK_START_TIME%Timestamp of the job task start 
%MON_N_TASK_END_TIME% Timestamp of the job task end
%MON_N_TASK_TIME_ELAPSED% The time or difference in seconds between a beginning time and an ending time of the job task
%MON_N_RECOVERED% 

0 = dependent of the %MON_N_ERROR% - ok or error was not recovered,

1 = error was recovered  

%MON_N_ERROR%

0 = ok

1 = error 

%MON_N_ERROR_CODE% Exception-code of the job error 
%MON_N_ERROR_TEXT%Exception message of the job (that processed the order) 
%MON_N_CREATED% Timestamp of the notification initial record 
%MON_N_MODIFIED%Timestamp of the latest changes to this notification record 

 

Example
 scheduler id = %MON_N_SCHEDULER_ID%, history id = %MON_N_ORDER_HISTORY_ID%, job_chain = %MON_N_JOB_CHAIN_NAME%(%MON_N_ORDER_ID%), error = %MON_N_ERROR_TEXT%

 Table of the history of notifications sended to system monitor.

NameDescription
%MON_SN_ID%Unique system notification id 
%MON_SN_NOTIFICATION_ID%

Reference to table SCHEDULER_MON_NOTIFICATIONS.ID  

%MON_SN_CHECK_ID%

Reference to table SCHEDULER_MON_CHECKS.ID   

%MON_SN_SYSTEM_ID% 

Reference to element attribute

SystemMonitorNotification / @system_id

defined in the XML configuration file

%MON_SN_SERVICE_NAME%

Reference to one of both element attributes

  • SystemMonitorNotification / Notification / NotificationMonitor / @service_name_on_error
  • SystemMonitorNotification / Notification / NotificationMonitor / @service_name_on_success

defined in the XML configuration file 

%MON_SN_STEP_FROM%

Reference to element attribute

SystemMonitorNotification / Notification / NotificationObjects / JobChain / @step_from

defined in the XML configuration file 

%MON_SN_STEP_TO%

Reference to element attribute

SystemMonitorNotification / Notification / NotificationObjects / JobChain / @step_to

defined in the XML configuration file 

%MON_SN_STEP_FROM_START_TIME%Timestamp of the start of the order step 
%MON_SN_STEP_TO_END_TIME%Timestamp of the end of the order step  
%MON_SN_STEP_TIME_ELAPSED% The time or difference in seconds between a beginning time and an ending time of the order step  
%MON_SN_NOTIFICATIONS%Number of notifications that already sended to a System Monitor
%MON_SN_MAX_NOTIFICATIONS%

Reference to element attribute

SystemMonitorNotification / Notification / NotificationObjects / JobChain / @notifications

defined in the XML configuration file  

%MON_SN_ACKNOWLEDGED%

0 = not acknowledged

1 = acknowledged 

%MON_SN_RECOVERED%

0 = recovery not sended

1 = recovery sended 

%MON_SN_SUCCESS%

0 = success not sended

1 = success sended 

%MON_SN_CREATED%Timestamp of the system notification initial record  
%MON_SN_MODIFIED%Timestamp of the latest changes to this system notification record  
Example
 step from = %MON_SN_STEP_FROM%, step to = %MON_SN_STEP_TO%, notification = %MON_SN_NOTIFICATIONS% (of %MON_SN_MAX_NOTIFICATIONS%)

 Table of the history of executed checks (Timer)

NameDescription
%MON_C_ID%Unique check id  
%MON_C_NOTIFICATION_ID%

Reference to table SCHEDULER_MON_NOTIFICATIONS.ID   

%MON_C_NAME%

Reference to element attribute

SystemMonitorNotification / Timer / @name

defined in the XML configuration file 

%MON_C_STEP_FROM% 

Reference to element attribute

SystemMonitorNotification / Timer / JobChain / @step_from

defined in the XML configuration file

 
%MON_C_STEP_TO%

Reference to element attribute

SystemMonitorNotification / Timer / JobChain / @step_to

defined in the XML configuration file

%MON_C_STEP_FROM_START_TIME% Timestamp of the start of the order step  
%MON_C_STEP_TO_END_TIME% Timestamp of the end of the order step  
%MON_C_STEP_TIME_ELAPSED% The time or difference in seconds between a beginning time and an ending time of the order step   
%MON_C_CHECK_TEXT% Message of the check  
%MON_C_CREATED%Timestamp of the check initial record  
%MON_C_MODIFIED%Timestamp of the latest changes to this check record 
Example
 timer name = %MON_C_NAME%, text = %MON_C_CHECK_TEXT%
Service variables
NameDescription
%SERVICE_NAME%

Current service name. One of both element attributes:

  • SystemMonitorNotification / Notification / NotificationMonitor / @service_name_on_error
  • SystemMonitorNotification / Notification / NotificationMonitor / @service_name_on_success
%SERVICE_STATUS%

Current service status. One of both element attributes or default: 

  • SystemMonitorNotification / Notification / NotificationMonitor / @service_status_on_error
  • SystemMonitorNotification / Notification / NotificationMonitor / @service_status_on_success
  • default CRITICAL error
  • default OK       success
%SERVICE_MESSAGE_PREFIX%

Message prefix

  • ERROR       error
  • RECOVERED     error recovery
  • TIMER       performance check
Example
 service name = %SERVICE_NAME%

 

OS environment variables 

 

All existing system variables can be defined by message with the syntax like %<variable name>% (Windows/Unix).

Example
 %TEMP%/test.exe

 

Examples 
Message on error
scheduler id=%MON_N_SCHEDULER_ID%, history id=%MON_N_ORDER_HISTORY_ID%, job_chain=%MON_N_JOB_CHAIN_NAME%(%MON_N_ORDER_ID%), step=%MON_N_ORDER_STEP_STATE%, error=%MON_N_ERROR_TEXT%            
Message on success
scheduler id=%MON_N_SCHEDULER_ID%, history id=%MON_N_ORDER_HISTORY_ID%, job_chain=%MON_N_JOB_CHAIN_NAME%(%MON_N_ORDER_ID%), steps(%MON_SN_STEP_FROM% to %MON_SN_STEP_TO%), order time elapsed = %MON_N_ORDER_TIME_ELAPSED%s            
Message on timer
name = %MON_C_NAME%, scheduler id=%MON_N_SCHEDULER_ID%, history id=%MON_N_ORDER_HISTORY_ID%, job_chain=%MON_N_JOB_CHAIN_NAME%(%MON_N_ORDER_ID%), steps(%MON_C_STEP_FROM% to %MON_C_STEP_TO%), check = %MON_C_CHECK_TEXT%            

Notification environment variables

The default com.sos.scheduler.notification.plugins.notifier.SystemNotifierProcessBuilderPlugin plugin used by the SystemMonitorNotification / Notification / NotificationCommand element sets the following variables as environment variables:

  1. Table variables
  2. Service variables

These variables can be used when the NotificationCommand calls the notification client not directly, but a shell script, that make the logical implementation for sending of the notification messages.

Table variables

All table variables (see Table variables explanation) are set as environment variables with the prefix:

  • SCHEDULER_MON_TABLE_

e.g.:

  • SCHEDULER_MON_TABLE_MON_N_ID
  • SCHEDULER_MON_TABLE_MON_N_SCHEDULER_ID
  • ...
Service variables
NameDescription

SCHEDULER_MON_SERVICE_NAME

Current service name. One of both element attributes:

  • SystemMonitorNotification / Notification / NotificationMonitor / @service_name_on_error
  • SystemMonitorNotification / Notification / NotificationMonitor / @service_name_on_success

SCHEDULER_MON_SERVICE_STATUS

Current service status. One of both element attributes or default:

  • SystemMonitorNotification / Notification / NotificationMonitor / @service_status_on_error
  • SystemMonitorNotification / Notification / NotificationMonitor / @service_status_on_success
  • default CRITICAL error
  • default OK       success

SCHEDULER_MON_SERVICE_MESSAGE_PREFIX

  • ERROR      error                           
  • RECOVERED    error recovery                            
  • TIMER             performance check 

SCHEDULER_MON_SERVICE_COMMAND

 Content of the SystemMonitorNotification / Notification / NotificationCommand after substitution

  

Sample NotificationCommand Unix. Script file (/tmp/command.sh).
1) configured command in the SystemMonitorNotification_<MonitorSystem>.xml file
<NotificationCommand><![CDATA[/tmp/command.sh]</NotificationCommand>
 
2) content of the /tmp/command.sh file
#! /bin/sh 
# Note: "> /tmp/command_output.txt" used to simulate the starting of the notification client
#
echo $SCHEDULER_MON_SERVICE_NAME:$SCHEDULER_MON_SERVICE_STATUS:$SCHEDULER_MON_SERVICE_MESSAGE_PREFIX history id = $SCHEDULER_MON_TABLE_MON_N_ORDER_HISTORY_ID > /tmp/command_output.txt
 
Sample NotificationCommand Windows. Script file (C:/temp/command.cmd).
1) configured command in the SystemMonitorNotification_<MonitorSystem>.xml file
<NotificationCommand><![CDATA[C:/Temp/command.cmd]</NotificationCommand>
 
2) content of the C:/Temp/command.cmd file
rem Note: "> C:/Temp/command_output.txt" used to simulate the starting of the notification client
rem
echo %SCHEDULER_MON_SERVICE_NAME%:%SCHEDULER_MON_SERVICE_STATUS%:%SCHEDULER_MON_SERVICE_MESSAGE_PREFIX% history id = %SCHEDULER_MON_TABLE_MON_N_ORDER_HISTORY_ID% > C:/Temp/command_output.txt
 

Examples

Examples OP5
NotificationInterface 

Here is an except of an XML file used for notifying a specific System Monitor (OP5 Monitor) and using NotificationInterface:

SystemMonitorNotification_OP5.xml
 ...
<!--
monitor_host            The hostname or ip address of System Monitor host 
monitor_port            The TCP port that the System Monitor would listen to
monitor_encryption      Encryption algorithm
service_host            The host that executes the passive check. The name must match the corresponding setting in the System Monitor
%MON_N_SCHEDULER_ID%    See explanation "Table variables"
...
-->
<NotificationInterface monitor_host="monitor_host" monitor_port="5667" monitor_encryption="XOR" service_host="service_host"><![CDATA[
scheduler id=%MON_N_SCHEDULER_ID%, history id=%MON_N_ORDER_HISTORY_ID%, job_chain=%MON_N_JOB_CHAIN_NAME%(%MON_N_ORDER_ID%), step =%MON_N_ORDER_STEP_STATE%, error=%MON_N_ERROR_TEXT%
]]></NotificationInterface>
...
NotificationCommand

Here is an except of an XML file used for notifying a specific System Monitor (OP5 Monitor) and using NotificationCommand on Windows:

SystemMonitorNotification_OP5.xml
... 
<!--
service_host               The host that executes the passive check. The name must match the corresponding setting in the System Monitor.
monitor_host               The hostname or ip address of System Monitor host.
%SERVICE_NAME%             See explanation "Service variables"
%SERVICE_STATUS%           See explanation "Service variables"
%SERVICE_MESSAGE_PREFIX%   See explanation "Service variables"
%MON_N_SCHEDULER_ID%       See explanation "Table variables"
...
NotificationCommand after substitution (error case):
<![CDATA[echo service_host:JobScheduler Monitoring Errors:2:ERROR scheduler id=scheduler_4444, history id=123, job_chain=test/my_jobchain(order_id), step=100, error=error occurred | D:\nsca\send_nsca.exe -H monitor_host -c D:\nsca\send_nsca.cfg -d : ]]>
 
NotificationCommand after substitution (recovery case): 
<![CDATA[echo service_host:JobScheduler Monitoring Errors:0:RECOVERED scheduler id=scheduler_4444, history id=123, job_chain=test/my_jobchain(order_id), step=100, error=error occurred | D:\nsca\send_nsca.exe -H monitor_host -c D:\nsca\send_nsca.cfg -d : ]]> 
 
NotificationCommand after substitution (success case):  
<![CDATA[echo service_host:JobScheduler Monitoring Success:0:scheduler id=scheduler_4444, history id=123, job_chain=test/my_jobchain(order_id), step=100, error= | D:\nsca\send_nsca.exe -H monitor_host -c D:\nsca\send_nsca.cfg -d : ]]>  
 
-->
<NotificationMonitor service_name_on_error="JobScheduler Monitoring Errors" service_name_on_success="JobScheduler Monitoring Success">
  <NotificationCommand><![CDATA[echo service_host:%SERVICE_NAME%:%SERVICE_STATUS%:%SERVICE_MESSAGE_PREFIX%scheduler id=%MON_N_SCHEDULER_ID%, history id=%MON_N_ORDER_HISTORY_ID%, job_chain=%MON_N_JOB_CHAIN_NAME%(%MON_N_ORDER_ID%), step=%MON_N_ORDER_STEP_STATE%, error=%MON_N_ERROR_TEXT% | D:\nsca\send_nsca.exe -H monitor_host -c D:\nsca\send_nsca.cfg -d : ]]>
  </NotificationCommand>  
</NotificationMonitor>

...
Examples Zabbix
NotificationCommand

Here is an except of an XML file used for notifying a specific System Monitor (Zabbix Monitor) and using NotificationCommand

SystemMonitorNotification_zabbix.xml
... 
<!--
zabbix_sender            Zabbix sender installed on the JobScheduler host
localhost                Hostname of the zabbix server
Zabbix_server            JobScheduler Agent name(host name) that registred on Zabbix
samples.job1             Item key of zabbix (replace "/" to "." of JOB_NAME
%MON_N_ERROR_TEXT%       See explanation "Table variables"
-->
<NotificationCommand>
<![CDATA[zabbix_sender -z localhost -s zabbix_server -k samples.job1 -o %MON_N_ERROR_TEXT%]]>
</NotificationCommand>
...

  

JobScheduler - Job Chains

See https://kb.sos-berlin.com/display/PKB/JobScheduler+Monitoring+Interface+-+Prerequisites+and+Installation#JobSchedulerMonitoringInterface-PrerequisitesandInstallation-JobChainConfiguration

 

WORK IN PROGRESS

Use Cases

Recoverable Errors

Initial Situation: A Job Chain is triggered by directory monitoring. That is, when a certain file comes in a monitored folder, the Job Chain starts.

Problem: The Job Chain ended with error.

Handling: The System Monitor will be notified to the service related to the Job Chain with the message error. If a new execution of the Job Chain from a new file end without errors, does not mean that the error is recovered, since the file that has been processed is now another one. That is, the error message at the System Monitor will stay till the same file is again placed in the monitored directory and the Job Chain ends without errors.

Configuration:

  • XML CheckConfigurationHistory.xml: Indicate the ID of the JobScheduler and the name of the Job Chain you want to monitor.
  • XML SystemMonitorNotification.xml: Specify the name of the Service (in the System Monitor) and specify that it is about a service_name_on_error since you want to have the control when the Job Chain ends in an error.
  • System Monitor: Services in the System Monitor have to be configured and named the same way as in the XML file above SystemMonitorNotification.xml.

Workflow Execution takes too long

Initial Situation: A Job Chain is triggered and it could not end, it hanged in a step, taking then longer than expected.

Problem: Execution time was too long

Handling: A timer for this Job Chain is set and the System Monitor will be notified about it. The expiration times for the Job Chains are configured with enough time for processing, that means, this is usually used for cases where the Job Chain hanged in a specific step.

Configuration:

  • XML CheckConfigurationHistory.xml: As in the example above, indicate the ID of the JobScheduler and the name of the Job Chain you want to monitor. Moreover, specify the timer for this specific job chain and the function to calculate the expiration time for the timer.
  • XML SystemMonitorNotification.xml: As in the example above, specify the name of the Service (in the System Monitor) and specify that it is about a service_name_on_error since you want to have the control when the Job Chain ends in an error. Moreover and essential for this particular case, specify how many times the timer should notify your System Monitor about the expiration of a timer.
  • System Monitor: As in the example above, Services in the System Monitor have to be configured and named the same way as in the XML file above SystemMonitorNotification.xml.

SFTP connection refused

Initial Situation: There is a Job Chain that uses SFTP for transferring files. You have a setback configured in this step of the Job Chain, so that if the connection to the SFTP server fails, this step is retried after some time.

Problem: The SFTP server is not available anymore.

Handling: The System Monitor will be notified to the service related to the Job Chain with the message error. However, you don't want to have a bunch of notifications for a Job Chain when is an external factor, the connection to the SFTP Server, what is producing the error.

Configuration:

  • XML CheckConfigurationHistory.xml: As in the example above, indicate the ID of the JobScheduler and the name of the Job Chain you want to monitor.
  • XML SystemMonitorNotification.xml: As in the example above, specify the name of the Service (in the System Monitor) and specify that it is about a service_name_on_error since you want to have the control when the Job Chain ends in an error. Moreover and very important in this case, specify how many times this Job Chain should notify your System Monitor about the error connecting to the SFTP Server. You can use step_from andstep_to for that in order to reduce the number of notifications for this specific step.
  • System Monitor: As in the example above, Services in the System Monitor have to be configured and named the same way as in the XML file above SystemMonitorNotification.xml.

Thresholds

Initial Situation: For example, a specific number of Workflow Executions have to be executed successfully till some specific time. That is, a specific value has to be monitored in order to determine if this quote was reached.

Handling: A new service for History is configured, so that the workflow executions (Job Chains in the JobScheduler vocabulary) send the information that they were executed and finished to the System Monitor.

Configuration:

  • XML CheckConfigurationHistory.xml: As in the example above, indicate the ID of the JobScheduler and the name of the Job Chain you want to monitor.
  • XML SystemMonitorNotification.xml: Specify the name of the Service (in the System Monitor) but now specify that it is about a service_name_on_success since you want to have the control when the Job Chain ends in an success, and not only when it ends on error.
  • System Monitor: As in the example above, Services in the System Monitor have to be configured and named the same way as in the XML file above SystemMonitorNotification.xml.

Acknowledgement

Initial Situation: An alert for a Service has been sent to the System Monitor and a Mail has been sent to the Service Desk (Support Team) notifying about it.

Handling: The problem is well known by the Service Desk and the "acknowledge" the problem. Through the acknowledgement JobScheduler will be notified to and will not send any more notification for this Service to the System Monitor till the Service is again recovered.

Configuration:

  • System Monitor: The step of notifying JobScheduler through an acknowledgement in the System Monitor is an execution of a script. This is nothing else than a notification, like sending a mail for instance, but instead, another action is executed, which is the execution of the script that contacts JobScheduler and add an order to the JobChain ResetNotifications described above.

 

 

 

 

 

 

 

 

 

  • No labels