Description of the JobSchedulerExistsFile Job - check whether a file exists

Checks for the existence of a file, a directory or for specific files inside of a directory. 

The polling in above graphic is provided by a file order source setting in the job chain.

Example Job to Create a Result Set.

This job is creating a result set. A result set contain the names of all files which are selected as specified by the filter criteria. The content of the result set is returned as a parameter, but can be written to a file, too.
Parameters, which are useful for creating a result-set, are:

Parameter Name

Description

raise_error_if_result_set_is

Raise error on expected size of result-set

result_list_file

Name of the result-list file

expected_size_of_result_set

Number of expected hits in result-list

on_empty_result_set

Set next node on empty result set

scheduler_sosfileoperations_resultset

The result of the operation as a list of items

scheduler_sosfileoperations_resultsetsize

The amount of hits in the result set of the operation

An example for a job-xml file:

  <job order='no' >
     <params>
       <param name="file" value="." />
       <param name="file_spec" value="" />
       <param name="gracious" value="false" />
       <param name="max_file_age" value="0" />
       <param name="min_file_age" value="0" />
       <param name="max_file_size" value="-1" />
       <param name="min_file_size" value="-1" />
       <param name="skip_first_files" value="0" />
       <param name="skip_last_files" value="0" />
       <param name="count_files" value="false" />
       <param name="create_order" value="false" />
       <param name="create_orders_for_all_file" value="false" />
       <param name="order_jobchain_name" value="" />
       <param name="next_state" value="" />
       <param name="on_empty_result_set" value="empty" />
       <param name="expected_size_of_result_set" value="0" />
       <param name="raise_error_if_result_set_is" value="0" />
       <param name="result_list_file" value="empty" />
     </params>
     <script language="java" java_class="sos.scheduler.file.JobSchedulerExistsFile" />
  </job>

This job can be used standalone, as a single job, or as an order driven job in a jobchain as a jobchain node. Parameters are respectively accepted as job- or as order-parameters.
A job can process multiple parameters that are analyzed when the job starts. Parameters are defined in the configuration of the job or of the order. Parameters can also be submitted by API methods. Parameters are optional or mandatory and may contain default values.
This job is creating a result set. A result set contain the names of all files which are selected as specified by the filter criteria. The content of the result set is returned as a parameter, but can be written to a file, too.

Parameter Definitions

Parameters Used by JobSchedulerExistsFile

 Name

Description

Mandatory

Default

file

File or Folder to watch for

true

.

file_spec

Regular Expression for filename filtering

false

 

gracious

Specify error message tolerance

false

false

max_file_age

Maximum age of a file

false

0

min_file_age

Minimum age of a file

false

0

max_file_size

Maximum size of a file

false

-1

min_file_size

Minimum size of one or multiple files

false

-1

skip_first_files

Number of files to remove from the top of the result-set

false

0

skip_last_files

Number of files to remove from the bottom of the result-set

false

0

count_files

Return the size of resultset

false

false

create_order

Activate file-order creation

false

false

create_orders_for_all_files

Create a file-order for every file in the result-list

false

false

create_orders_for_new_files

Create a file-order for every new file in the result-list 

falsefalse
param_name_file_pathThe name of the parameter that contains the name of the file to be transferredfalse---

order_jobchain_name

The name of the jobchain which belongs to the order

false

 

next_state

The first node to execute in a jobchain

false

 

merge_order_parameterMerge actual order parameter into new created orderfalsefalse

on_empty_result_set

Set next node on empty result set

false

empty

expected_size_of_result_set

Number of expected hits in result-list

false

0

raise_error_if_result_set_is

Raise error on expected size of result-set

false

0

result_list_file

Name of the result-list file

false

empty

check_steady_state_of_files

Check the completeness of a file (steady state)

falsefalse
steady_state_countMaximum Number of Checkpointsfalse30
check_steady_state_intervalTemporal distance between checkpointsfalse1

Parameter file: File or Folder to watch for


File or Folder to watch for
Checked file or directory
Supports masks for substitution in the file name and directory name with format strings that are enclosed by [and] . The following format strings are supported:

 [date: date format ]  
 
 '''date format''' must be a valid Java data format string, e.g. '''yyyyMMddHHmmss''' , '''yyyy-MM-dd.HHmmss''' etc. 

An example:

 <param name="file" value="sample/hello[date:yyyyMMdd].txt" />  

On 2050-12-31 the parameter file contains the value "sample/hello20501231.txt" .
This parameter supports substitution of job parameter names with their value if the job parameter name is enclosed by %  and  % .
An example: <param name="file" value"%scheduler_file_path%" />
During the job runtime the parameter file contains the value of the job parameter scheduler_file_path . Using Directory Monitoring with File Orders the job parameter scheduler_file_path contains automatically the path of the file that triggered the order.
Data-Type : SOSOptionString
The default value for this parameter is ..
This parameter is mandatory.

Parameter file_spec: Regular Expression for filename filtering


Regular Expression for filename filtering
Regular Expression for file filtering. The behaviour is CASE_INSENSITIVE.
Only effective if the parameter file is a directory.
Some remarks on regular expression, as used in JobScheduler:

  • A regular expression is not a wildcard . To get an impression of the differences one have a look on the meaning of the wildcard .txt, which will select all filenames with the filename-extension ".txt". A regular expression to match, e.g. works the same way, this "wildcard" must look like "^.\.txt$". That looks a little bit strange but it is much more flexible and powerfull on filtering filenames than the concept of wildcards, if one want to filter more complex names or pattern.
  • The general syntax of an regular expression , also referred to as regex or regexp, is described here . It is different to other RegExp definitions, e.g. as for Perl.

Data-Type : SOSOptionRegExp

Parameter gracious: Specify error message tolerance


Specify error message tolerance
Enables or disables error messages that are caused by an empty result-set, which is the result of an operation, executed by the job. Therefore this parameter can control the sequence of nodes or states in a job-chain.
Valid values:

 '''false, 0, off, no, n, nein, none''' , '''true, 1, on, yes, y, ja, j''' and '''all''' . 
 

The following rules apply when the result set is empty:

GRACIOUS

Standalone Job

Order Job

false, 0, off, no, n, nein, none

error log, Task error

error log, set_state error

true, 1, on, yes, y, ja, j

no error log, Task success

no error log, set_state error

all

no error log, Task success

no error log, set_state success

For example, the setting "gracious=all" will suppress all errors regarding an empty result-set and will terminate a Job (standalone and inside a jobchain) as it would be without errors.
Data-Type : SOSOptionGracious
The default value for this parameter is false.

Parameter max_file_age: Maximum age of a file


maximum age of a file
Specifies the maximum age of a file. If a file is older, then it is deemed not to exist, it will be not included in the result list.
Data-Type : SOSOptionTime
The default value for this parameter is 0.

Parameter min_file_age: Minimum age of a file


minimum age of a file

Specifies the minimum age of a files. If the file(s) is newer then it is classified as non-existing, it will be not included in the result list.
Data-Type : SOSOptionTime
The default value for this parameter is 0.

Parameter max_file_size: Maximum size of a file


maximum size of a file

Specifies the maximum size of a file in bytes: should the size of one of the files exceed this value, then it is classified as non-existing.
valid values for file size are

Value

Description

-1

The value of the parameter has no effect and the parameter is not part of the filter.

number

a number stand for the size in byte, e.g. 40 means 40 bytes.

numberKB

a number with the chars "KB" stand for the size in kilobyte.

numberMB

a number with the chars "MB" stand for the size in megabyte.

numberGB

a number with the chars "GB" stand for the size in gigabyte.

Data-Type : SOSOptionFileSize
The default value for this parameter is -1.

Parameter min_file_size: Minimum size of one or multiple files


minimum size of one or multiple files
Specifies the minimum size of one or multiple files in bytes: should the size of one of the files fall below this value, then it is not included in the result list of the operation.
valid values for file size are

Value

Description

-1

The value of the parameter has no effect and the parameter is not part of the filter.

number

a number stand for the size in byte, e.g. 40 means 40 bytes.

numberKB

a number with the chars "KB" stand for the size in kilobyte.

numberMB

a number with the chars "MB" stand for the size in megabyte.

numberGB

a number with the chars "GB" stand for the size in gigabyte.

Data-Type : SOSOptionFileSize
The default value for this parameter is -1.

Parameter skip_first_files: Number of files to remove from the top of the result-set


number of files to remove from the top of the result-set
The number of files are removed from the beginning of the set resulting by min_file_size , min_file_age etc. These files are excluded from further operations.
The result set is sorted according to the used filter parameters:

  • min_file_age , max_file_age : in ascending order by date of last modification, the newest file first.
  • min_file_size , max_file_size : in ascending order by file size, the smallest file on top.
  • if parameters for file age as well as file size are given the result set is sorted by file age.

Only either skip_first_files or skip_last_files is allowed to be set at the same time.
Data-Type : SOSOptionInteger
The default value for this parameter is 0.

Parameter skip_last_files: Number of files to remove from the bottom of the result-set


Number of files to remove from the bottom of the result-set

The number of files are removed from the end of the set resulting by min_file_size, min_file_age etc. These files are excluded from further operations.

The result set is sorted according to the constraining parameters used:

  • min_file_age, max_file_age: in ascending order by date of last modification, the newest file first.
  • min_file_size, max_file_size: in ascending order by file size, the smallest file first.

If parameters for file age as well as file size are given the set is sorted by file age.

Only either skip_first_files or skip_last_files is allowed to be set at one time.
Data-Type : SOSOptionInteger
The default value for this parameter is 0.

Parameter count_files: Return the size of resultset


Return the size of resultset
If this parameter is set true " true " the number of matches is returned in the order parameter " scheduler_SOSFileOperations_file_count ".
Valid values: true, 1, on, yes, y, ja, j and false, 0, off, no, n, nein
This parameter is valid and available for order driven jobs only. JobChains, for example, are order driven jobs. In standalone jobs this parameter will be ignored without further notice.
Data-Type : SOSOptionBoolean
The default value for this parameter is false.

Parameter create_order: Activate file-order creation


Activate file-order creation
With this parameter it is possible to specify, that for all filenames in the resultlist or for the first file only (see create_orders_for_all_files ) a file-order has to be created and launched.
Valid values: true, 1, on, yes, y, ja, j and false, 0, off, no, n, nein
Data-Type : SOSOptionBoolean
The default value for this parameter is false.
Use together with parameter:
create_orders_for_all_files - Create a file-order for every file in the result-listorder_jobchain_name - next_state -

Parameter create_orders_for_all_files: Create a file-order for every file in the result-list


Create a file-order for every file in the result-list
Valid values: true, 1, on, yes, y, ja, j and false, 0, off, no, n, nein
Data-Type : SOSOptionBoolean
The default value for this parameter is false.
Use together with parameter:
create_order - Activate file-order creationorder_jobchain_name - next_state -

Parameter create_orders_for_new_files: Create a file-order for every new file in the result-list


Create a file-order for every new file in the result-list

If this parameter is set to "true", for each new file which is in the result set, a file-order is created and started.

This parameter is in effect only if the create_orders parameter is not set or has the value "true".

example 1: create a file-order

    create_orders_for_new_files=true

Valid values: true, 1, on, yes, y, ja, j and false, 0, off, no, n, nein.

DataType: SOSOptionBoolean

Default: false

Parameter param_name_file_path: The name of the parameter containing the name of the file to be transferred


The name of the parameter containing the name of the file to be transferred

This parameter sets the name of the parameter that contains the name of the transferred file. The default value is scheduler_file_path. The name should be changed from the default if it is not desired to create file_orders that have to handle a file sink.

DataType: SOSOptionString

Default: ---

Parameter order_jobchain_name: The name of the jobchain which belongs to the order


The name of the jobchain which belongs to the order
The name of the job chain which has to be launched by the order is the value of this parameter.
One must take into account, that the name of the jobchain must contain a subfolder structure if the jobchain is not in the folder "live". An example: the jobchain "Test" is located in "live/sample/FileOperations/". The value which has to be specfied is then "/sample/FileOperations/Test".
Data-Type : SOSOptionString
Use together with parameters:

Parameter next_state: The first node to execute in a jobchain


The first node to execute in a jobchain
The name of the node of a jobchain, with which the execution of the chain must be started, is the value of this parameter.
Data-Type : SOSOptionJobChainNode
Use together with parameters:

Parameter merge_order_parameter: Merge actual order parameter into new created order  


merge actual order parameter into new created order  
 

This parameter specifies that the order, which has to be created, will be extended by the parameters of the actual order.

DataType: SOSOptionBoolean
Default: false 

Parameter on_empty_result_set: Set next node on empty result set


Set next node on empty result set
The next Node (Step, Job) to execute in a JobChain can be set with this parameter. The value of the parameter is a (valid) node-name of the current JobChain. In case of an empty result-set, e.g. due to non existent files, the current job will end without an errors and the JobChain will continue with the name of the node which is given as the value of this parameter.
Data-Type : SOSOptionJobChainNode
The default value for this parameter is empty.

Parameter expected_size_of_result_set: Number of expected hits in result list


Number of expected hits in result-list

Data-Type : SOSOptionInteger
The default value for this parameter is 0.
Use together with parameter:

Parameter raise_error_if_result_set_is: Raise error on expected size of result set


Raise error on expected size of result-set
With this parameter it is possible to raise an error if the quantity of hits of the result list is according to the value of this parameter.
An example:
Assuming, that the parameter "raise_error_if_result_set_is=ne" is defined and the parameter "expected_size_of_result_set=1" is specified as well. If the number of hits is not equal to "1" an error will raised.

Data-Type : SOSOptionRelOp
The default value for this parameter is 0.
Use together with parameter:

Parameter result_list_file: Name of the result list file


Name of the result-list file
If the value of this parameter specifies a valid filename the result-list will be written to this file.
Data-Type : SOSOptionFileName
The default value for this parameter is empty.

Parameter check_steady_state_of_files: Check the completeness of a file (steady state)


Check the completeness of a file (steady state) 

In some file transfer scenarios the receiver of a file has no knowledge about the time when the sender creates the file. In case of a (very) large file it can be the situation that the receiver tries to read the file but the sender has not finished writing it. If the receiver get the file at the moment the sender is still writing, as a result he will get a corrupted, incomplete file.

Setting this parameter to "true" the receiver will check the file for completeness before he starts the transfer.

At the end, this is not a very secure approach, because the receiver is checking the date of last modification and the size of the file. If both not changing between a time intervall, which is defined by the parameters ..., the file is guessed to be complete. If the sender is terminated without writing the complete file, or the network is down, or the speed of processing the file is going slow, the receiver will get a corrupted file.

A better approach for avoiding corrupt files is to use the atomic method: writing a file and after completion of writing rename the file. For more details about this method see parameter atomic_suffix or atomic_prefix.

If more than one file is to be transferred, the transactional approach is the first choice. See parameter transactional.

DataType: SOSOptionBoolean

Default: false

Parameter steady_state_count: Maximum Number of Checkpoints


Maximum Number of Checkpoints 

The value of this option specifies the number of retries for to check the steady state of a file.

DataType: SOSOptionInteger

Default: 30

Parameter check_steady_state_interval: Temporal distance between checkpoints


Temporal distance between checkpoints 

The value of this option defines the temporal distance in seconds between two checkpoints.

DataType: SOSOptionTime

Alias: Steady_State_Interval

Default: 1

Return Parameters from JobSchedulerExistsFile

The order parameters described below are returned by the job to the JobScheduler. JobSchedulerExistsFile

Name

Title

Mandatory

Default

scheduler_file_path

File to process for a file-order

false

empty

scheduler_file_parent

Pathname of the file to process for a file-order

false

empty

scheduler_file_name

Name of the file to process for a file-order

false

empty

scheduler_sosfileoperations_resultset

The result of the operation as a list of items

false

empty

scheduler_sosfileoperations_resultsetsize

The amount of hits in the result set of the operation

false

empty

scheduler_sosfileoperations_file_count

Return the size of the result set after a file operation

false

0

 

Parameter scheduler_file_path: File to process for a file-order


file to process for a file-order
Using Directory Monitoring with File Orders the job parameter scheduler_file_path contains automatically the path of the file that triggered the order.
Data-Type : SOSOptionFileName
The default value for this parameter is empty.

Parameter scheduler_file_parent: Pathname of the file to process for a file-order


Pathname of the file to process for a file-order

Data-Type : SOSOptionFileName
The default value for this parameter is empty.

Parameter scheduler_file_name: Name of the file to process for a file-order


Name of the file to process for a file-order

Data-Type : SOSOptionFileName
The default value for this parameter is empty.

Parameter scheduler_sosfileoperations_resultset: The result of the operation as a list of items


The result of the operation as a list of items

Data-Type : SOSOptionstring
The default value for this parameter is empty.
Use together with parameter:

Parameter scheduler_sosfileoperations_resultsetsize: The amount of hits in the result set of the operation


The amount of hits in the result set of the operation

Data-Type : SOSOptionsInteger
The default value for this parameter is empty.
Use together with parameter:

Parameter scheduler_sosfileoperations_file_count: Return the size of the result set after a file operation


Return the size of the result set after a file operation

Data-Type : SOSOptionInteger
The default value for this parameter is 0.
Use together with parameter: