The SSH Session Management adds the possibility to end orphaned remote processes started by SSH Jobs or orphaned JobScheduler tasks from SSH Jobs.

Use Case

What happens if the connection to your remote host breaks while a script is still running? How can the JobScheduler job which started the remote script know about that?

What happens if a remote script has finished, but the JobScheduler task which started the process remotely cannot know about that? (e.g. because of a temporarily broken network connection).

The SSH Session Management provides a solution for dealing with that type of issues: it provides the possibility of configuring an additional job chain to check for orphaned processes on the remote host as well as check for orphaned tasks.

You can configure your existing SSH job chain to start the monitoring job chain, which then will monitor the task of your original job chain as well as the processes started on the remote host via SSH.

Mode Of Operation

To configure your SSH Job to use the SSH Session Management you have to configure your SSH Job and define a cleanup job chain for the cleanup work.

The feature requires the use of the JSch implementation by JCraft. See How To - Usage of the SSH Job (JobSchedulerSSHJob) with JCraft's JSch for more information about configuring your SSH Job to use the JSch implementation,

The SSH Session Management will carry out one of the following actions after checking the remote processes and JobScheduler tasks:

Related Development Issues

Configuration of the SSH Job to be monitored

A number of additional parameters have to be added to the job configuration before the SSH Job can be monitored:

runWithWatchdog

cleanupJobchain

ssh_job_kill_pid_command

ssh_job_terminate_pid_command

ssh_job_get_pid_command

ssh_job_get_child_processes_command

ssh_job_get_active_processes_command

Example: Configuration With JOE

Job

Parameters

Example: XML Configuration

<job order="yes" stop_on_error="false" title="Launch commands or executable files by SSH">
  <description>
    <include file="jobs/JobSchedulerSSHJob.xml"/>
  </description>
  <params>
    <param name="host" value="[HOST]"/>
    <param name="port" value="[SSHPORT]"/>
    <param name="user" value="[USERNAME]"/>
    <param name="password" value="[PASSWORD]"/>
    <param name="auth_method" value="password"/>
    <param name="command_script_file" value="[PATH_TO_SCRIPTFILE]/test_sleep_90s.sh"/>
    <param name="runWithWatchdog" value="true"/>
    <param name="cleanupJobchain" value="kill_jobs/remote_cleanup_test"/>
    <param name="ssh_job_kill_pid_command" value="kill -9 \${pid}"/>
    <param name="ssh_job_terminate_pid_command" value="kill -15 \${pid}"/>
    <param name="ssh_job_get_pid_command" value="echo $$"/>
    <param name="ssh_job_get_active_processes_command" value="/bin/ps -ef | grep \${pid} | grep \${user} | grep -v grep"/>
  </params>
  <script java_class="sos.scheduler.job.SOSSSHJob2JSAdapter" language="java"/>
  <run_time/>
</job>

Configuration Of The Cleanup Job Chain

A cleanup job chain with two jobs has to be configured to process the cleanup of the remote processes or the JobScheduler task.

The cleanup job chain consists of two jobs, one to read the pid of the connected shell from a temporary file on the remote host and one to check if the process or the JobScheduler Task is still running. The temporary file will be generated automatically and deleted after processing. The second job also ends either the remote process or the JobScheduler task as appropriate.

Job 1: The read-pid-from-temporary-file-Job

The first job reads the pid from the temporary file on the remote host. If your SSH Job is configured as described above, the temporary file will have been created automatically on the remote host.

Configure the Class of  the Job in JOE like this:

After choosing the relevant class name from the list, configure a setback for the job. The setback will be used to restart the job according to the conditions found by the job (described above).

Job 2: The check-and-kill-Job

The second job checks:

depending on the conditions found. The actions taken are described in the Mode of Operation chapter above. 

Configure the class of  the job in JOE as follows:

Example of the XML Configuration

<job order="yes" stop_on_error="no" title="Launch read pid file command by SSH">
  <description>
    <include file="jobs/SOSSSHReadPidFileJob.xml"/>
  </description>
  <script java_class="sos.scheduler.job.SOSSSHReadPidFileJobJSAdapter" language="java"/>
  <delay_order_after_setback delay="30" is_maximum="no" setback_count="1"/>
  <delay_order_after_setback delay="0" is_maximum="yes" setback_count="3"/>
  <run_time/>
</job>
<job order="yes" stop_on_error="no" title="Kills orphaned PIDs on the Remote Host for clean up by SSH">
  <description>
    <include file="jobs/SOSSSHKillJob.xml"/>
  </description>
  <script java_class="sos.scheduler.job.SOSSSHKillJobJSAdapter" language="java"/>
  <delay_order_after_setback delay="30" is_maximum="no" setback_count="1"/>
  <delay_order_after_setback delay="0" is_maximum="yes" setback_count="3"/>
  <run_time/>
</job>
<job_chain orders_recoverable="yes" visible="yes">
  <job_chain_node error_state="ERROR" job="readPidFile" next_state="CheckTaskAndRemoteProcessesAndKillIfNeeded" on_error="setback" state="ReadPidFile"/>
  <job_chain_node error_state="ERROR" job="CheckAndKill" next_state="SUCCESS" on_error="setback" state="CheckTaskAndRemoteProcessesAndKillIfNeeded"/>
  <job_chain_node state="ERROR"/>
  <job_chain_node state="SUCCESS"/>
</job_chain>

Change Management References