Introduction

  • An outage of the Controller does not necessarily affect the execution of workflows by Agents. Agents will continue to execute workflows. However, if a workflow includes jobs that are executed on different Agents then the workflow will be put on hold as the Controller is required to switch Agents during execution of the workflow.

  • The Controller holds all workflow-related configuration items and orchestrates Agents. At design-time Agents receive the workflow configuration from the Controller and at run-time Agents return execution results and JS7 - Order State Transitions to the Controller.
  • The Controller passes execution results to the JS7 - History Service which updates the JOC Cockpit database.
  • If a connection from a Controller is not available then:
    • Agents will act autonomously. Execution results are stored in the Agents' journal.
    •  The JOC Cockpit will not be updated with workflow execution results.
  • Testing by SOS includes performing tests for the scenario when the Controller is not available for 24 hours and the Agent executes all scheduled orders. When the Controller is started again then job execution results are updated to the JOC Cockpit history and become visible in the GUI.

JOC Cockpit Behavior

  • Users do not receive up-to-date information:
    • The JOC Cockpit will not receive updated information about the state of orders and of workflow execution results.
    • The GUI will report the Controller being unreachable and will have no information about the status of Agents.
  • Any interaction with a Controller such as deploying workflows and cancelling/suspending orders will be delayed.
    • This means that such requests are held in memory with the JOC Cockpit Proxy Service which will try to forward the requests when the Controller becomes available.
    • It is not recommended that the JOC Cockpit is restarted in this situation as pending requests would be lost and deployments would have to be repeated.
  • The JOC Cockpit Proxy Service will try to re-establish the connection to the Controller. When this is successful the GUI will automatically update the status of the Controller in its Dashboard.

Agent Behavior

  • Workflows are deployed to Agents just once - at design-time. At run-time, the Agent will have received orders for workflows some time, such as a week, in advance. These orders come from the JS7 - Daily Plan Service, and the Agent can execute them autonomously.
  • In case of a Controller outage:
    • Agents will continue to execute workflows at the scheduled date and time. 
    • the Agent's journal will grow. The journal holds execution results for workflows and order state transitions.
  • Depending on the workflow load, the journal files that are stored with the ./state directory can grow to some Gigabytes. There is no harm in this as long as sufficient storage is available.
  • When the connection between Controller and Agent becomes available then the Agent will report back order state transitions and execution results to the Controller and the Agent's journal will shrink.

Troubleshooting



  • No labels