You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Introduction

Consider the situation where a large number of similar data records are to be processed one after the other. A typical example here would be credit card transactions from a cash terminal or a retail checkout. A standard procedure used to speed up processing of such data items is to split up each item into its constitutent parts and process each part seperately. With financial transactions, each data item is usually made up of a header, body and footer, with the header and footer being of fixed length and the length of the body varying with the number of items in the transaction.

Here we have a situation with a combination of parallel and serial processing as shown in the diagram below:

ToDo:

  • Add diagram

It should be clear that in order to ensure that processing of the different parallel steps does not get out of step it is necessary to introduce some sort of synchronization. One approach would be to use split and sync jobs as described in our Parallel_Execution_in_a_job_chain FAQ.

ToDo:

  • What disadvantages - Max Orders?

An alternative approach is described in this FAQ - setting a lock before the processing steps for a data record are started and releasing it once the parallel processing has ended.

In this approach the lock is not used a intended - i.e. ToDo
This has the advantage ... ToDO

The locks are generated dynamically using the JobScheduler internal API and in the example code presented here Rhino JavaScript is used. (SpiderMonkey JavaScript is only available with 32-bit JobSchedulers.)

Download the example files:

The Example Job Chain

The example job chain only illustrates the steps relevant to the lock, which take place before and after the parallel processing steps. These are represented schematically in the example job chain by two jobs, 'pre_proc_check' and 'process_data'. The 'pre_proc_check' job represents the splitting up the data record into its constitutent parts and the 'process data' job represents the parallel processing and bringing together of the parallel processing threads.

The example job chain is started by one of three 'file_order_source's, filtered by a regular expression. All three 'file_order_source's lead directly to the 'start' node.

The 'success' and '!error' nodes are 'file_order_sink' nodes that are configured to remove the 'file_order_source' files.

See also:

  • ToDo: Best practice;

{{SchedulerFaqBack‏‎}}

  • No labels