Serializing jobs with locks before parallel execution

This Page is Work in Progress

Introduction

Consider the situation where a large number of similar data records are to be processed one after the other. A typical example here would be credit card transactions from a cash terminal or a retail checkout. A standard procedure used to speed up processing of such data records is to split up each item into its constitutent parts and process each part seperately. With financial transactions, each data records is usually made up of a header, body and footer, with the header and footer being of fixed length and the length of the body varying with the number of items in the transaction.

Here we have a situation with a combination of parallel and serial processing as shown in the diagram below:

ToDo:

Add diagram

It should be clear that in order to ensure that processing of the different parallel steps does not get out of step it is necessary to introduce some sort of synchronization. One approach would be to use split and sync jobs as described in our Parallel_Execution_in_a_job_chain FAQ. It is however important to ensure that the processing of data records is clearly seperated from each another. This clearly seperated serial processing of data is the subject of this FAQ.

In the solution described here, a lock is set each time processing of a data record is started. This lock is then released once the processing of all parts of the data record has been completed. The parallel processing steps themselves are treated seperately - a 'black-box' approach which increases the flexibility of use.

Note that with this approach JobScheduler locks are not quite used as intended - they are normally aquired by jobs (see the Locks section of the JobScheduler reference documentation). Instead they are used as a convenient method of setting a flag.

ToDo:

What disadvantages
Max Orders?

The Example

Download the example files:

Follow the instructions in the 'ReadMe - serial_job_execution_with_lock' file to install and use the example.

The locks in this example are generated dynamically using the JobScheduler internal API and in the code presented here Rhino JavaScript is used. (SpiderMonkey JavaScript is only available with 32-bit JobSchedulers.)

The Job Chain

The example job chain only illustrates the steps relevant to the lock, which take place before and after the parallel processing steps. These are represented schematically in the example job chain by two jobs, pre_proc_check and process_data. The pre_proc_check job represents the splitting up the data record into its constitutent parts and the process data job represents the parallel processing and bringing together of the parallel processing threads.

The example job chain is started by one of three file_order_sources, filtered by a regular expression. All three file_order_sources lead directly to the start node.

The success and !error nodes are file_order_sink nodes that are configured to remove the file_order_source files.

Space shortcuts

Page tree

This Page is Work in Progress

Introduction

The Example

The Job Chain