Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: 'Implementation Summary' table added

...

  • The functions for terminating task processes by the JobScheduler Master and Universal Agent have been extended to allow the use of both SIGTERM and SIGKILL signals  on on Unix servers. This allows :
    • SIGTERM is sent first and allows an orderly termination of task processes to take place
    over
    • within a limited period of time.
    • If the time allowed has been exceeded and the processes are still running then SIGKILL will be sent.
  • The information contained in this article draws together detailed information contained in a range of issues and should primarily be of interest to persons in engineering and to a lesser extent persons in operating functions.

...

This feature has been implemented stepwise between release releases 1.9.0 and 1.10.0 (see the table of issues below for more detailed information).

...

  • The use of both SIGTERM and SIGKILL signals on Unix servers has the following advantages:
    • The use of SIGTERM before SIGKILL means that there is a greater chance of data being saved after the signal has been issued.

    • The SIGTERM signal can - in contrast with SIGKILL - be monitored, i.e. a pre-/postprocessing post-processing Script can be carried out. This means that the ending of a task by the JobScheduler can be reacted to and the user process itself can be ended.

    • The implementation of SIGTERM allows post-processing methods such as spooler_process_after() to complete within the timeout period.
  • The time allowed between the SIGTERM and the SIGKILL signal signals can be specified in the command using the timeout attribute (the default is 15 sec): <kill_task … timeout=".."/>

  • This feature can also be applied for:
    • remote processes, i.e. processes started by SSH and those started by an Agent,
    • child processes started by a process running on an agent (JS-1468).

...

The following operations can be carried out from the JobScheduler Operating Center interface (JOC) or by use of the command line:

  1. Operation: kill immediately
    • JOC sends <kill_task immediately="yes"/>
    • The process is killed immediately using the SIGKILL signal.
  2. Operation: terminate with timeout
    • JOC sends <kill_task immediately="yes" timeout="15"/>
    • The process receives a SIGTERM signal. Should that process not terminate within the specified timeout period then it will be killed with a SIGKILL signal.
  3. Operation: terminate
    • JOC sends <kill_task immediately="yes" timeout="never"/>
    • The process receives a SIGTERM signal. Monitoring of the process termination as described in Operation 2 above is not carried out.

...

They are to be found  the ./bin folder of the JUA installation.

The appropriate script for the operating system being used is called with the parameter -kill-agent-task-id=...  and

  • finds the process containing that idthe ID specified and
  • kills the process including all childrenchild processes.

The JUA start script starts the Agent with the new -kill-script parameter as follows:

  • by default the -kill-script is parameterized with the path to the respective kill script for Windows/Unix .as appropriate,
  • the environment variable SCHEDULER_KILL_SCRIPT environment variable can be used to set a different kill script.

See  JS-1468 & JS-1495 for more detailed information.

Implementation Summary

The implementation of the different termination operations available for the JobScheduler Master and Universal Agent is summarized in the table below.

The termination operations available are:

  • Terminate: <kill_task immediately="true" timeout="...">
  • Kill: <kill_task immediately="true">
  • Timeout: a job with the timeout attribute.
    Status
    colourYellow
    titleVerify
 Windows
Standalone
Linux
Standalone
Windows
Classic Agent
Linux
Classic Agent
Windows
Universal Agent
Linux
Universal Agent
Shell job      
Terminatenot supported(tick)not supported

(error) 2)

Jira
serverJIRA Extern
serverId6dc67751-9d67-34cd-985b-194a8cdc9602
keyJS-1420

not supported(tick)
Kill(tick)(tick)(tick)

(tick)

Jira
serverJIRA Extern
serverId6dc67751-9d67-34cd-985b-194a8cdc9602
keyJS-1421

(tick)

Jira
serverJIRA Extern
serverId6dc67751-9d67-34cd-985b-194a8cdc9602
keyJS-1468

(tick)

Jira
serverJIRA Extern
serverId6dc67751-9d67-34cd-985b-194a8cdc9602
keyJS-1468

Timeout

(tick)

Jira
serverJIRA Extern
serverId6dc67751-9d67-34cd-985b-194a8cdc9602
keyJS-1463

(tick)

Jira
serverJIRA Extern
serverId6dc67751-9d67-34cd-985b-194a8cdc9602
keyJS-1463

(tick)(tick) 

(tick)

Jira
serverJIRA Extern
serverId6dc67751-9d67-34cd-985b-194a8cdc9602
keyJS-1468

Shell job with monitor      
Terminatenot supported(tick)not supported(tick)not supported

(error) 3)

Jira
serverJIRA Extern
serverId6dc67751-9d67-34cd-985b-194a8cdc9602
keyJS-1468

Kill(tick)(tick)(tick)(tick)

(tick)

Jira
serverJIRA Extern
serverId6dc67751-9d67-34cd-985b-194a8cdc9602
keyJS-1382

(error) 1)

Jira
serverJIRA Extern
serverId6dc67751-9d67-34cd-985b-194a8cdc9602
keyJS-1468

Timeout(tick)(tick)   

(error) 1)

Jira
serverJIRA Extern
serverId6dc67751-9d67-34cd-985b-194a8cdc9602
keyJS-1382

API job      
Terminatenot supported(tick)not supported(tick)not supported(tick)
Kill(tick)(tick)(tick)(tick)(tick)(tick)
Timeout(tick)(tick)    
  1. The child process continues: the shell job (and Java process if applicable) is killed, but the child process (sleep or ping - see attached jobs) is detached from the process tree and continues to run.
  2. No effect: neither the shell job nor it's children receive a signal
  3. The Java process is terminated but it's child process (the shell job script) and child processes thereof (sleep command) do not receive a SIGTERM signal
  4. 1.9.2-1.9.4 the kill comes too late, after the task has ended normally

Delimitation

  • This feature is intended for Unix platforms that implement the SIGTERM and SIGKILL signals. It is not intended for Windows platforms for which exclusively the Kill Immediately command applies.
  • When using traps then please consider that the process created by the <shell> element receives the signal. Subsequent scripts that are called within the <shell> element will not receive the signal. You could therefore:
    • configure traps directly within the <shell> element. The shell process will then receive and handle the signal.
    • configure traps in a shell script that is added by an <include> element instead of being stated within the <shell> element. The included shell script will receive and handle the signal.
    • forward signals to subsequent shell scripts that are called within a <shell> element.

...

This example contains a job that uses a sigterm SIGTERM trap to show the difference between the <kill_task> and <terminate_task> commands provided by JOC.

...