Date: Fri, 29 Mar 2024 12:34:07 +0000 (UTC) Message-ID: <1780867964.12891.1711715647558@change.sos-berlin.com> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_12890_889320741.1711715647558" ------=_Part_12890_889320741.1711715647558 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
This example uses a simple job chain which starts shell jobs to demonstr= ate the different behaviors that can be configured for JobScheduler if an e= rror occurs in one of the jobs.
In particular, the effect of the stop_on_error
and on=
_error
parameters is demonstrated along with the use of suspende=
d orders and setbacks to retry running a job.
stop_on_error=3D"no"
=
./config/live
folder of your JobScheduler installation.samples/shell_error/si=
mple_error_chain
.simple_error_order
,=
open the order menu and choose Start order now.The order will now go move through both nodes of the job chain. On the s=
econd node, an error will occur due to exit 5
being included i=
n the job's shell script. If the email settings of your JobScheduler are co=
nfigured correctly, you will now receive an error mail.
Click on the second node job (samples/shell_error/simple_chained_j=
ob2
) to open the Job pane. You will see that the second job=
has the pending state. =
This means that he job can process further orders (although in this example=
, they will all fail as long as exit 5
is specified). The erro=
r has been blamed on the order and the order has been moved to the state wh=
ich was configured as error_state for the step in which the error =
happened. In the example, this is suspended. The error_state can a=
lso be used to configure error handling jobs, it need not point to a final =
state of the job chain.
If you change the exit code from exit 5
to exit 0 and click on the order menu, you will see that you can either resum=
e the order or reset it:
stop_on_error=3D"no"
is the default setting for jobs created with JOE and has the advantage =
that a job is not blocked for all orders if one order should fail due, for =
example, to a configuration error .
The error can also be blamed on the job, which will be described in the = next section.
stop_on_error=3D"yes"
simple_ch=
ained_job2.job.xml
exit 0
change it back to exit 5
to=
simulate an error againstop_on_error=3D"no"
to Note that the job state of the second job is now stopped. This =
means that the job will no longer process any orders. The order simpl=
e_error_order
is now enqueued before the job. Other orders running i=
nto this job will also be enqueued.
simple_ch=
ained_job2.job.xml
exit 5
(which caused th=
e error) to exit 0
and save the change.Now click on
You will see In the order history that processing of the order has ended= .
This example has used the stop_=
on_error=3D"yes"
to blame the error on the job.
Another option in the event of an error is to suspend the order:
stop_on_error
is set for both jobs to "no=
"job_chain_node<=
/code> add a new on_error=3D"suspend"
=
attribute and save
exit 5
t=
o exit 0
Note that we also have a dedicated example, showing the use of setbacks:= How to use setbacks to make a job retry in the event an error<= /a>
Another option is to configure automatic retries using "setback= ":
stop_on_error
is set for both jobs to "no=
"sim=
ple_chained_job2.job.xml
job configuration file
exit 5
into the job againAdd the following lines after the script ele= ment:
<dela= y_order_after_setback setback_count=3D"1" delay=3D"20"/> <delay_order_after_setback setback_count=3D"3" delay=3D"60"/> <delay_order_after_setback setback_count=3D"6" is_maximum=3D"yes"/>= pre>
simple_chained_job2.job.xml
=
li>
sim=
ple_error_chain.job_chain.xml
job_chain_node
"next" (the =
node for the simple_chained_job2 job) set the on_error
attribute to "setback" and saveThis time the order will run until the error occurs and will then be set= back. The order is then enqueued at the second job with new start time (ju= st 20 seconds later after the first error). Press update repeatedly to see = the order count down the time for the next start.
After the 6th time the order has encountered an error, it will be set to= the error_state.
If the job is fixed during the retries, the order will go to the nex= t_state.
The main "switch" for controlling error handling of shell jobs is the stop_on_error
=
is set to yes, the job is blamed for the error and is stopped. If =
stop_on_error
is set to no, the order is blamed for t=
he error. For more information on stop_on_error
see http://www.sos-berlin=
.com/doc/en/scheduler.doc/xml/job.xml#attribute_stop_on_error
By default, if an order is blamed for an error - i.e. if stop_on_e=
rror
is set to no, the order is moved to the error_stat=
e. This behavior can be changed at the job chain node with the o=
n_error
attribute. This can be set to "suspend" or "set=
back" and will cause the order to be either suspend or setback in the =
event of an error.
stop_on_error=
=3D"no"
is set for the job.Jobs which use the JobScheduler API Interface may implement more sophisticated methods to choose wh= ether an error is blamed on the job or on the order and how to handle error= s that occur in orders.
FEATURE AVAILABILITY ENDING WITH RELEASE 1.9
When a shell script is executed within a job and when this script writes= messages to the standard output and error channels (stdout and stderr) then the JobScheduler treats these as info messages= .
2015-03-18 07:57:38.991+0100 [info] This message goes=
to stdout
2015-03-18 07:57:38.993+0100 [info] This messag=
e goes to stderr
FEATURE AVAILABILITY STARTING FROM RELEASE 1.10
Job error handling can optionally be extended to detect errors from outp= ut that is created by shell scripts.
JS-1393 - Getting issue details... STATUS
JS-1329 - Getting issue details... STATUS
2015-03-18 07:57:38.993+0100 [error] This m= essage goes to stderr.
Note that:
<job stde=
rr_log_level=3D"error|info">
job attribute.
error
causes shell job output to stderr t=
o be considered by JobScheduler as errors.info
and causes the JobScheduler not =
to raise an error.T | Key | Linked Issues | Fix Version/s | Status | P | Summary | Updated |
---|---|---|---|---|---|---|---|
= | JS-1393= a> | JS-1329 | 1.10 | Released | Identify output channel i= n JobScheduler logs | Jul 07, = 2016 | |
= | JS-1329= a> | JS-1393 , JOE-166 = , = JS-1615 | 1.10 | Released | Check stderr for errors i= n shell script execution | Jul 07, = 2016 |
------=_Part_12890_889320741.1711715647558--