Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

What is the difference between directory monitoring and file

...

watching?

Question:

The reference documentation states two chapters on "Directory Monitoring" and "Directory Monitoring with File Orders". What is the difference between these features?

Answer:

Directory Monitoring:

is used to start jobs automatically when a file event is triggered in a directory. You could either monitor a directory to start a job or have separate orders being created for every file.

  1. Directory monitoring for job starts

    You could have any job start automatically in case of changes to one or more directories by adding the <start_when_directory_changed> element to the job configuration. JobScheduler will start the job if an event is triggered in the directory for a file that matches a regular expression.

    However, your job implementation will have to care for the fact that multiple files could arrive simultaneously. JobScheduler passes the file names to the job by the environment variable SCHEDULER_TASK_TRIGGER_FILES and by the API method spooler_task.trigger_files(). Multiple file names are separated by ";". Additionally your job has to be careful in handling file names that contain spaces as you could see from the below examples. Moreover, its up to your job implementation to move or remove the files from the input directory and to handle respective errors. Therefore it could be more convenient to use file orders.

    Example:
    
    <job name="my_job">
      <!-- for unix shell -->
      <script language = "shell"><![CDATA[
        IFS=";"
        for trigger_file in ${SCHEDULER_TASK_TRIGGER_FILES}
        do
          echo $trigger_file
          mv "$trigger_file" /tmp/output
        done
        IFS=$' \t\n'
        exit 0
      ]]</script>
      <!-- for windows shell -->
      <--
      <script language = "shell"><![CDATA[
        @echo off
        if not defined SCHEDULER_TASK_TRIGGER_FILES exit 0
        set trigger_files=%SCHEDULER_TASK_TRIGGER_FILES:;=?%
        :loop
        for /F "usebackq tokens=1* delims=?" %%i in ('%trigger_files%') do (
          set trigger_files=%%j
          @echo %%~fi
          move /y "%%~fi" \tmp\output
          goto loop
        )
        exit 0
      ]]></script>
      -->
      <start_when_directory_changed directory="/tmp" regex="sos.*"/>
    </job>
                      
  2. Directory Monitoring for File Orders

    Starting with release 1.2.9 you could have orders being created automatically for every file that appears in one of the monitored directories. This is done by adding one or more <file_order_source/> elements as the first job nodes of your job chain. For an explanation of orders see What is the concept of "job chains and order processing"?

    For every directory covered by a <file_order_source> element orders will be created automatically for files that match the given regular expression. You do not have to care for concurrency issues: the order is created just once and the file name is provided by the order parameter order.params().value("scheduler_file_path"). Having processed the file by your jobs it will be moved or removed by a <file_order_sink> element at the end of the job chain. Should the file be manually deleted, then JobScheduler would automatically remove the order unless it is processed by a job node.

    As orders can be persistently stored in a JobScheduler database, processing could be resumed after a restart of JobScheduler. The same is true for errors during processing in job nodes: the order could be setback or repeat the job node after a given delay.

    Example:
    
    <job_chain  name = "inbound_files">
      <file_order_source  directory = "/tmp/inbound"      regex = "[^~]$"
                             delay_after_error = "5"/>
      <file_order_source  directory = "/tmp/inbound.add"  regex = "[^~]$"
                             delay_after_error = "5"/>
      <job_chain_node         state = "convert"           next_state = "transfer"
                                 error_state = "error"
                                job = "file_convert"/>
      <job_chain_node         state = "transfer"          next_state = "success"
                                 error_state = "error"/>
                                job = "file_transfer"/>
      <file_order_sink      move_to = "/tmp/inbound.success"  state = "success"/>
      <file_order_sink      move_to = "/tmp/inbound.error"    state = "error"/>
    </job_chain>
                      

...