You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Introduction

Consider the situation where a large number of similar data records are to be processed one after the other. A typical example here would be credit card transactions from a cash terminal or a retail checkout. A standard procedure used to speed up processing of such data records is to split up each item into its constitutent parts and process each part seperately. With financial transactions, each data records is usually made up of a header, body and footer, with the header and footer being of fixed length and the length of the body varying with the number of items in the transaction.

Here we have a situation with a combination of parallel and serial processing as shown in the diagram below:

ToDo:

  • Add diagram

It should be clear that in order to ensure that processing of the different parallel steps does not get out of step it is necessary to introduce some sort of synchronization. One approach would be to use split and sync jobs as described in our Parallel_Execution_in_a_job_chain FAQ. It is however important to ensure that the processing of data records is clearly seperated from each another. This clearly seperated serial processing of data is the subject of this FAQ.

In the solution described here, a lock is set each time processing of a data record is started. This lock is then released once the processing of all parts of the date record has been completed. The parallel processing steps themselves are treated seperately - a 'black-box' approach which increases the flexibility of use.

Note that with this approach JobScheduler locks are not quite used as intended - they are normally aquired by jobs (see the Locks section of the JobScheduler reference documentation). Instead they are used as a convenient method of setting a flag.

ToDo:

  • What disadvantages - Max Orders?

The locks are generated dynamically using the JobScheduler internal API and in the example code presented here Rhino JavaScript is used. (SpiderMonkey JavaScript is only available with 32-bit JobSchedulers.)

Download the example files:

The Example Job Chain

The example job chain only illustrates the steps relevant to the lock, which take place before and after the parallel processing steps. These are represented schematically in the example job chain by two jobs, 'pre_proc_check' and 'process_data'. The 'pre_proc_check' job represents the splitting up the data record into its constitutent parts and the 'process data' job represents the parallel processing and bringing together of the parallel processing threads.

The example job chain is started by one of three 'file_order_source's, filtered by a regular expression. All three 'file_order_source's lead directly to the 'start' node.

The 'success' and '!error' nodes are 'file_order_sink' nodes that are configured to remove the 'file_order_source' files.

See also:

  • ToDo: Best practice;

{{SchedulerFaqBack‏‎}}

  • No labels