You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

WORK IN PROGRESS!

Introduction

Consider the situation where files with different characteristics (type, source, or similar) can be processed in parallel but all the files with common characteristics have to be processed sequentially.

For example, the centralised data processing of a company receives stock movement reports from subsidiaries at regular intervals. Whilst the reports from different subsidiaries can be processed simultaneously, the reports from each subsidiary have to be processed sequentially.

The recommended approach for this situation would be:

  • Use a single 'load_file' File Order Source directory to which all files are delivered.
  • JobScheduler would use regular expressions to identify the files arriving in this directory on the basis of their names, timestamps or file extensions and forward them for processing accordingly.
  • JobScheduler would then set a 'lock' for the subsidiary whose file is being processed to prevent further files from this subsidiary being processed as long as processing continues.
     Should a 'new' file from this subsidiary arrive whilst its predecessor is being processed, the job 'receiving' the new file will be 'set back' by JobScheduler as long as the lock for the subsidiary is set.
    
  • This lock would be released by JobScheduler once processing of the 'first' file has been completed.
  • The job 'receiving' the new file will now be able to forward this file for processing.
  • JS starts as soon as file matching with Regular Expression found in in directory
  • JS's aquire_lock job matches file with regular expression and decide file's category i.e. Berlin or Munich
  • Once aquire_lock finds the matching category its try to set an Semaphore (Flag) using JS's inbuilt LOCK mechanism
  • There is only one instance on LOCK is allowed or once LOCK is assigned to first file of Berlin category, next file has to wait or setback until the LOCK is free.
  • THe same mechanism will be repeated for files from category Munich but since the LOCK is not acquired ( or Semaphore (Flag)) is not set for Munich, file from category Munich will be allowed to be processed.
  • Once process is finished depending upon success or error , JS will move the file from in to either done (success) or failed(error).
  • After moving input file to correct target directory JS job release_lock will be called which will remove the lock/Semaphore from JS and next file from same category will be allowed.

Restrictions with this solution

  • The trick with this approach is to ensure that the files are processed sequentially, should more than one file arrive from a subsidary at once because there is no guarantee that a 'group' of files arriving as a batch will be written to the file systen in a particular order.
    • One approach would be to wait until a steady state in the incoming directory has been reached (no new file has been added and the size of all files remains constant over a period such as a minute) before starting to identify files and forward them for processing. JobScheduler could then sort files according to their names before forwarding them for processing.

Demo Package

A demonstration of this solution is available for download from:

Installation

  • Unpack the zip file to a local directory
  • Copy the 'SQLLoaderProc' folder to your JobScheduler 'live' folder.
     ...
    
  • Copy the 'Data' folder to the a suitable local location.
     The default location for this folder, which is specified in the job configurations, is:
     {{C:\sandbox}}
     This location is used in this FAQ. 
     The following paths have to be modified if the location of the 'Data' folder is changed:
    
    • The 'load_files' File Order Source directories in the load_files.job_chain.xml job chain object
    • the 'source_file' and 'target_file' paths specified as parameters in the move_file_suc.job.xml and move_file_error.job.xml objects

Running the demo

  • Just copy files from the 'Data/__test-files' folder to the 'in' folder, JobScheduler will automatically start processing within a few seconds.

How does the Demo Work?

  • JobScheduler starts as soon as file matching with Regular Expression found in in directory.
  • JobScheduler's aquire_lock job matches file with regular expression and decide file's category i.e. Berlin or Munich.
  • Once aquire_lock finds the matching category its try to set an Semaphore (Flag) using JobScheduler's inbuilt LOCK mechanism
  • There is only onc instance on LOCK is allowed or once LOCK is assigned to first file of Berlin category, next file has to wait or setback until the LOCK is free.
  • THe same mechanism will be repeated for files from category Munich but since the LOCK is not acquired ( or Semaphore (Flag)) is not set for Munich, file from category Munich will be allowed to be processed.
  • Once process is finished depending upon success or error , JS will move the file from in to either done (success) or failed(error).
  • After moving input file to correct target directory JobScheduler job release_lock will be called which will remove the lock/Semaphore from JS and next file from same category will be allowed.

Graphical Representation of the Solution

Warning: : syntax error in line 1 near '\'

See also:

  • No labels