How to make sure that a file is completely written

Problem

Within a job chain file_order_source starts an order when the file is created, not when the file is ready.

In some file transfer scenarios the receiver of a file has no knowledge about when the sender creates a file. In case of a large file, it is possible for the receiver to try to read a file before the sender has finished writing it. If the receiver then attempts to use the file at this moment, he will get a corrupted, incomplete file.

Solutions

Using file_order_source

There are three ways how to use file_order_source:

The sender creates a file named abc.txt~. After the transfer is completed, the sender renames the file to abc.txt. You would use a regular expression such as ^.*\.txt$ to check for the presence of files.

The sender creates a file named abc.txt. When it is ready, a second file with 0 byte will be created. The name of the second file is abc.txt.trigger. Here, you would use a regular expression such as ^.*\.txt.trigger$. Note that with this approach you have the disadvantage that the name of the trigger file is listed under scheduler_file_path, not the name of the file that should be executed.

Set-up a job chain where the file size is checked in the first node. Then carry out a setback if the file size is changing. This can be done with the job JobSchedulerExistsFile

Using the JobSchedulerExistsFile job

This job has the advantage over file_order_source solutions that it allows the use of parameters, for example for the name of the target directory, and it allows you to configure the polling rate.
The JobSchedulerExistsFile job also checks whether the file size is constant - i.e. Is the file still being written? - and will only proceed if the file size is not changing.
The JobSchedulerExistsFile job has three parameters to manage the check steady state behaviour.
- check_steady_state_of_files: If true, job will check the steady state.
- check_steady_state_interval: Interval in seconds between two checks
- steady_state_count: If set, this is maximum number of intervals. If the maximum is reached, the task will be terminated with an error.

Related Downloads

You can download example files covering both file_order_source and JobSchedulerExistsFile job solutions

Space shortcuts

Page tree

Problem

Solutions

Using file_order_source

Using the JobSchedulerExistsFile job

Related Downloads