Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Here you go an example of an XML file used for the monitoring of a specific JobChain:

Code Block
languagexml
 <?xml version="1.0" encoding="utf-8"?>
      <CheckHistoryConfiguration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="CheckHistoryConfiguration_v1.0.xsd">
          <MonitoredObject>
	      <!-- configure job chains for monitoring -->
              <JobChains>
	          <JobChain name="samples/sample_jobChain_1"/>
                  <JobChain name="samples/sample_jobChain_2"/>
                  <JobChain name="samples/sample_jobChain_3"/>
               </JobChains>
 
               <!-- configure checks for performance measurement -->
	       <Timers>
                   <Timer>
		   <!-- 
                     configure job chains and expected maximum execution time for performance measurement
 
                     impact: if the execution time of step 100 on current order in job chain samples/sample_jobChain_1 is greater as 10 seconds,
                     the current order will be set as performance problem. 
                   -->
                     <JobChains>
		         <JobChain name="samples/sample_jobChain_1" step_from="100" step_to="100"/>
	             </JobChains>
                     <Maximum><Script language="javascript"><![CDATA[10]]></Script></Maximum>
		   </Timer>
	       </Timers>
          </MonitoredObject>
  </CheckHistoryConfiguration>

Explanation

  • MonitoredObject/JobChains (optional) can contain several JobChain definitions for monitoring of error or success conditions
    • JobChain (required) has the following attributes (one of scheduler_id or name must be set):
      • scheduler_id (optional) - JobScheduler instance with the given identification. By default - JobChain will be checked in all JobScheduler instances that logged into the same database
      • name (optional) - Job chain name including possible folder names. By default - all JobChains for defined scheduler_id are checked
      • step_from (optional) - Start Job node name for checking
      • step_to (optional) - End Job node name for checking
  • MonitoredObject/Timers (optional) can contain several Timer definitions for performance measurement
    • Timer (required) has the following elements
      • JobChains (optional) - can contain several JobChain definitions for performance measurement
        • JobChain (required) has the following attributes (one of scheduler_id or name must be set):
          • scheduler_id (optional) - JobScheduler instance with the given identification. By default - JobChain will be checked in all JobScheduler instances that logged into the same database
          • name (optional) - Job chain name including possible folder names. By default - all JobChains for defined scheduler_id are checked
          • step_from (optional) - Start Job node name for checking
          • step_to (optional) - End Job node name for checking
      • Minimum (optional) - expected minimum execution time for all configured job chains in the MonitoredObject/Timers/Timer/JobChains
        • Script (required) - definition of the expected minimum value and has the following attributes
          • language (required) - script engine. currently javascript engine will be supported
      • Maximum (optional) - expected maximum execution time for all configured job chains in the MonitoredObject/Timers/Timer/JobChains
        • Script (required) - definition of the expected maximum value and has the following attributes
          • language (required) - script engine. currently javascript engine will be supported

Sample Timer configuration using order parameter to calculate expected execution time

Code Block
languagexml
 ....
  <!-- configure check for performance measurement -->
  <Timer>
      <!-- 
      configure job chains and expected maximum execution time for performance measurement
 
      impact: if the execution time of current order in job chain samples/sample_jobChain_1 is greater as calculated time (in seconds),
      the current order will be set as performance problem.
 
      The calculation uses the order parameter FILE_SIZE.
 
      Parameter FILE_SIZE must be configured on the appropriate step in a job chain (using StoreResultsJobJSAdapterClass as monitor) for storing into database.
      -->
      <JobChains>
          <JobChain name="samples/sample_jobChain_1"/>
      </JobChains>
      <Maximum>
          <!-- sample execution time calculation dependend of file size -->
          <Script language="javascript"><![CDATA[
              function calculate(){
                  var fileSize		       = new java.lang.Double(%FILE_SIZE%);
                  var timerExpiryFactor       = 0.0025;
                  var timerExpiryTolerance    = timerExpiryFactor*0.1;
                  var timerExpiry 	       = new java.lang.Double(timerExpiryFactor+timerExpiryTolerance);
                  timerExpiry 		       = timerExpiry*fileSize*60;
              return timerExpiry;
              } 
              calculate();
              ]]></Script>
      </Maximum>
  </Timer>
  ...

Schema: SystemMonitorNotification_v1.0.xsd

Description:

  1. Specifies delivery way to System Monitor.
  2. Specifies notification for error or success conditions
  3. Specifies notification for checks of measure the performance of JobScheduler objects

Example SystemMonitorNotification_op5.xml

Here you go an example of an XML file used for notifying a specific System Monitor (op5 Monitor) and using NotificationCommand:

Code Block
languagexml
<?xml version="1.0" encoding="utf-8"?>
<SystemMonitorNotification xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="SystemMonitorNotification_v1.0.xsd">
    <Notification>
    <!--
        configure system monitor service name and command for send notification to OP5 system monitor using nsca client
 
        notification command substitution in this case:
 
        All Environment variables   e.g. %TEMP% or %JAVA_HOME%        
 
        %SERVICE_NAME%              Error Service (service_name_on_error)
 
        %SERVICE_STATUS%            1 if error occured      (service_status_on_error)
                                    0 if error recovered    (service_status_on_success)
 
        %SERVICE_MESSAGE_PREFIX%    ERROR       if error occured
                                    RECOVERED   if error recovered
                                    TIMER       if performance check
 
        %ORDER_HISTORY_ID% ...      table field name of result row for message string(see table definition SCHEDULER_MON_NOTIFICATIONS)   
        -->
        <NotificationMonitor service_name_on_error="Error Service" service_status_on_error="1" service_status_on_success="0">
            <NotificationCommand>
<![CDATA[cmd /c echo my_nsca_service_host:%SERVICE_NAME%:%SERVICE_STATUS%:%SERVICE_MESSAGE_PREFIX%history id=%ORDER_HISTORY_ID%, step =%ORDER_STEP_STATE%, error=%ERROR_TEXT%, check = %CHECK_TEXT% | C:\nsca\send_nsca.exe -H nsca_server_host -c C:\nsca\send_nsca.cfg -d : ]]>
            </NotificationCommand>
	    </NotificationMonitor>
 
            <NotificationObject>
	    <!-- 
            configure job chains and number of send operations for same problem (if problem is recovered or acknowledged no error notifications will be sended) for sending error notifications 
 
            requirement: monitoring of this job chains must be configured in CheckHistoryConfiguration.xml
            -->
            <JobChains>
                <JobChain notifications="10" name="samples/sample_jobChain_1"/>
                <JobChain notifications="10" name="samples/sample_jobChain_2"/>
            </JobChains>
 
	    <Timers>
	        <Timer>
		<!-- 
                    configure job chains and number of send operations to same check 
 
                    requirement: timer check for this job chain must be configured in CheckHistoryConfiguration.xml
                -->
                    <JobChains>
		        <JobChain notifications="1" name="samples/sample_jobChain_1"/>
		    </JobChains>
         	</Timer>
	    </Timers>
	</NotificationObject>
    </Notification>
</SystemMonitorNotification>

Explanation

  • SystemMonitorNotification can contain several Notification definitions for notification of error or success conditions
    • Notification (required) contain one NotificationMonitor
      • NotificationMonitor (required) contains the configuration for delivery notifications to System Monitor and has the following attributes
        • service_name_on_error (optional) - Service name to send of error/recovery messages. One of service_name_on_error or service_name_on_success must be set.
        • service_name_on_success (optional) - Service name to send of success messages if order is successfully completed. One of service_name_on_error or service_name_on_success must be set.
        • service_status_on_error (optional) - Service status (e.g. CRITICAL or WARNING) to send of error messages. If not set - CRITICAL will be sended
        • service_status_on_success (optional) - Service status (e.g. SUCCESS) to send of success messages. If not set - OK will be sended
      • NotificationMonitor can has one of the following elements
        • NotificationCommand (optional) command line for calling of the extern script for system notification
        • NotificationInterface (optional) calling API for system notification (currently for NSCA notifications). This Element has the following attributes
          • service_host (required) - hostname for the host the notification are sent from (the way it is named at the System Monitor)
          • monitor_port (required) - port of System Monitor to receive notifications
          • monitor_host (required) - hostname of System Monitor
          • monitor_encryption (required) - specifies that the communication with the System Monitor is encrypted. NONE, XOR, TRIPLE_DES encryptions are available.
      • NotificationObject (required) contains the configuration of objects, which will be sended to System Monitor
        • JobChains (optional) - can contain several JobChain definitions
          • JobChain (required) has the following attributes (one of scheduler_id or name must be set):
            • notifications (optional) - Number of notifications for the same problem (if problem is recovered or acknowledged - no notification will be sended). By default - 1
            • scheduler_id (optional) - JobScheduler instance with the given identification. By default - JobChain will be checked in all JobScheduler instances that logged into the same database
            • name (optional) - Job chain name including possible folder names. By default - all JobChains for defined scheduler_id are checked
            • step_from (optional) - Start Job node name for checking
            • step_to (optional) - End Job node name for checking
        • Timers (optional) - can contain several Timer definitions
          • Timer (required) has the following elements
            • JobChains (optional) - can contain several JobChain definitions for performance notification
              • JobChain (required) has the following attributes (one of scheduler_id or name must be set):
                • notifications (optional) - Number of notifications for the same check. By default - 1
                • scheduler_id (optional) - JobScheduler instance with the given identification. By default - JobChain will be checked in all JobScheduler instances that logged into the same database
                • name (optional) - Job chain name including possible folder names. By default - all JobChains for defined scheduler_id are checked
                • step_from (optional) - Start Job node name for checking
                • step_to (optional) - End Job node name for checking


Sample Notification configuration using NotificationInterface

Code Block
languagexml
 ....
  <!--
       notification message substitution in this case:
 
        All Environment variables   e.g. %TEMP% or %JAVA_HOME%        
 
        %ORDER_HISTORY_ID% ...      table field name of result row for building message (see table definition SCHEDULER_MON_NOTIFICATIONS)   
        -->
  <NotificationMonitor service_name_on_error="Error Service">
      <NotificationInterface service_host="my_nsca_service_host" monitor_port="5667" monitor_host="nsca_server_host" monitor_encryption="XOR">
      order history id=%ORDER_HISTORY_ID%, job chain=%JOB_CHAIN_NAME%, order id=%ORDER_ID%, step =%ORDER_STEP_STATE%, error=%ERROR_TEXT%, check = %CHECK_TEXT%
      </NotificationInterface>
 
  ...

Job Chains

Job Chains for these solutions have to be placed under \live\notification. Four Job Chains were implemented for this solution and they have the following functions:

  • CheckHistory: reads JobScheduler database tables where the logging is placed, analyses them and writes results into another tables, the Notification tables.
  • CleanupNotifications: deletes entries in the Notification tables. Currently this takes place once every day.
  • ResetNotifications: sets Status for Notifications in the Notification tables (e.g. Acknowledge)
  • SystemNotifier: responsible for notifiying the system Monitor about the current notifications. Moreover, this JobChain is responsible for updating the Notification tables after having notified the System Monitor.

System Monitor

  1. The System Monitor receives just passive checks, that means, there are no active checks for monitoring JobScheduler. The only configuration here is the capability to receive passive checks from a remote host.
  2. The services in the System Monitor have to be in concordance with the JobScheduler configuration. Passive checks (services) have to be configured and named following the convention used in the XML described above for the JobScheduler (CheckHistoryConfiguration.xml and SystemMonitorNotification_op5.xml).

 

Use Cases

Recoverable Errors

Initial Situation: A Job Chain is triggered by directory monitoring. That is, when a certain file comes in a monitored folder, the Job Chain starts.

Problem: The Job Chain ended with error.

Handling: The System Monitor will be notified to the service related to the Job Chain with the message error. If a new execution of the Job Chain from a new file end without errors, does not mean that the error is recovered, since the file that has been processed is now another one. That is, the error message at the System Monitor will stay till the same file is again placed in the monitored directory and the Job Chain ends without errors.

Configuration:

  • XML CheckConfigurationHistory.xml: Indicate the ID of the JobScheduler and the name of the Job Chain you want to monitor.
  • XML SystemMonitorNotification.xml: Specify the name of the Service (in the System Monitor) and specify that it is about a service_name_on_error since you want to have the control when the Job Chain ends in an error.
  • System Monitor: Services in the System Monitor have to be configured and named the same way as in the XML file above SystemMonitorNotification.xml.

Workflow Execution takes too long

Initial Situation: A Job Chain is triggered and it could not end, it hanged in a step, taking then longer than expected.

Problem: Execution time was too long

Handling: A timer for this Job Chain is set and the System Monitor will be notified about it. The expiration times for the Job Chains are configured with enough time for processing, that means, this is usually used for cases where the Job Chain hanged in a specific step.

Configuration:

  • XML CheckConfigurationHistory.xml: As in the example above, indicate the ID of the JobScheduler and the name of the Job Chain you want to monitor. Moreover, specify the timer for this specific job chain and the function to calculate the expiration time for the timer.
  • XML SystemMonitorNotification.xml: As in the example above, specify the name of the Service (in the System Monitor) and specify that it is about a service_name_on_error since you want to have the control when the Job Chain ends in an error. Moreover and essential for this particular case, specify how many times the timer should notify your System Monitor about the expiration of a timer.
  • System Monitor: As in the example above, Services in the System Monitor have to be configured and named the same way as in the XML file above SystemMonitorNotification.xml.

SFTP connection refused

Initial Situation: There is a Job Chain that uses SFTP for transferring files. You have a setback configured in this step of the Job Chain, so that if the connection to the SFTP server fails, this step is retried after some time.

Problem: The SFTP server is not available anymore.

Handling: The System Monitor will be notified to the service related to the Job Chain with the message error. However, you don't want to have a bunch of notifications for a Job Chain when is an external factor, the connection to the SFTP Server, what is producing the error.

Configuration:

  • XML CheckConfigurationHistory.xml: As in the example above, indicate the ID of the JobScheduler and the name of the Job Chain you want to monitor.
  • XML SystemMonitorNotification.xml: As in the example above, specify the name of the Service (in the System Monitor) and specify that it is about a service_name_on_error since you want to have the control when the Job Chain ends in an error. Moreover and very important in this case, specify how many times this Job Chain should notify your System Monitor about the error connecting to the SFTP Server. You can use step_from andstep_to for that in order to reduce the number of notifications for this specific step.
  • System Monitor: As in the example above, Services in the System Monitor have to be configured and named the same way as in the XML file above SystemMonitorNotification.xml.

Thresholds

Initial Situation: For example, a specific number of Workflow Executions have to be executed successfully till some specific time. That is, a specific value has to be monitored in order to determine if this quote was reached.

Handling: A new service for History is configured, so that the workflow executions (Job Chains in the JobScheduler vocabulary) send the information that they were executed and finished to the System Monitor.

Configuration:

  • XML CheckConfigurationHistory.xml: As in the example above, indicate the ID of the JobScheduler and the name of the Job Chain you want to monitor.
  • XML SystemMonitorNotification.xml: Specify the name of the Service (in the System Monitor) but now specify that it is about a service_name_on_success since you want to have the control when the Job Chain ends in an success, and not only when it ends on error.
  • System Monitor: As in the example above, Services in the System Monitor have to be configured and named the same way as in the XML file above SystemMonitorNotification.xml.

Acknowledgement

Initial Situation: An alert for a Service has been sent to the System Monitor and a Mail has been sent to the Service Desk (Support Team) notifying about it.

Handling: The problem is well known by the Service Desk and the "acknowledge" the problem. Through the acknowledgement JobScheduler will be notified to and will not send any more notification for this Service to the System Monitor till the Service is again recovered.

Configuration:

  • System Monitor: The step of notifying JobScheduler through an acknowledgement in the System Monitor is an execution of a script. This is nothing else than a notification, like sending a mail for instance, but instead, another action is executed, which is the execution of the script that contacts JobScheduler and add an order to the JobChain ResetNotifications described above.