You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Purpose

  • JS7 implements resilience at the following levels:
    • Architecture: Any components can be clustered for high availability, implementing an active-passive cluster architecture with automated fail-over.
    • Communication: Components communicate asynchronously, practically this means that any component can be shut down or can be subject to an outage without breaking the availability of any other component. Components reconcile after restart and synchronize state information to catch-up with latest processing results.
    • Programming: The programming model is based on asynchronous handling of events that are raised for state transitions.

Sharing of Duties

  • Each component is assigned a specific duty:
    • The JOC Cockpit is used to manage the inventory of workflows, jobs and related objects. In addition JOC Cockpit is used to monitor and to control workflow execution by other components.
      • An outage of JOC Cockpit does not impact workflow execution by the Controller and by Agents.
      • An outage of JOC Cockpit simply means that users are blind about what workflows are currently executed but it does not mean that workflows would not run.
      • Any results about workflow execution are reported later on by a Controller when JOC Cockpit becomes available.
    • The Controller orchestrates Agents and forwards JS7 - Workflows and the JS7 - Daily Plan to Agents.
      • If the Controller were not available then this does not affect availability of JOC Cockpit. 
      • For Agents the loss of a connection from the Controller means that they cannot immediately report back execution results. However, Agents will continue to execute workflows that are within their reach and will store the information about state transition of orders and log output created by jobs with their journal for later forwarding to a Controller.
      • The exception to this rule are workflows that are implementing cross-platform scheduling, i.e. executing jobs within the same workflow on different Agents. In this situation an Agent can proceed with a workflow to nodes only that are assigned the current Agent.
    • Agents execute JS7 - Workflow Instructions as long as the instruction - including to execute any jobs - are assigned the current Agent.
      • Agents expect Controllers to establish a connection and will respond to connection requests but cannot actively establish a connection to a Controller.
      • Agents receive Workflow configurations and the Daily Plan from a Controller and know when to run orders. Agents therefore work semi-autonomously within the limit of being assigned to the respective workflow instructions.
  • x


Clustering

  • The JOC Cockpit can be operated for an active-passive cluster with one active instance and any number of passive instances.
    • Fail-over is handled automatically between cluster members by use of the JS7 - Cluster Service
    • The JOC Cockpit cluster relies on a persistence layer provided by a supported JS7 - Database.
  • The Controller implements an active-passive cluster with one active instance and one passive instance.
    • The Controller implements clustering and journaling by its own and does not require additional components such as a DBMS.
    • Cluster members couple and synchronize automatically.
  • The Agent offers both an active-passive and an active-active cluster.

Communication

Asynchronous communication is based on the fact that messages are sent to a partner component without relying on the availability of the given component. Neither is guaranteed that a message was received by the recipient nor is assumed that the recipient will be able to respond in good time.

  • If the communication between components breaks, e.g. due to a connection loss or network issue, then the calling component will repeatedly try to reconnect to the partner component. This mechanism works for the duration of an outage, for minutes, hours or days.
  • If messages cannot be forwarded then they are stored in memory for later retries:
    • Such messages are lost in case of status information requests if the calling component is restarted.
    • Such messages are persistently stored in case of requested status changes to objects.
  • Therefore consider that it makes no sense to restart a calling component if the partner component is not available. The mantra to "restart the Windows server" does not apply to JS7 except that you had good reasons to assume that a connection loss is due to system resources.

Programming Model

The programming model includes that operations in each component 




  • No labels