You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Purpose

JS7 implements resilience at the following levels:

  • Architecture: Any components can be clustered for high availability, implementing an active-passive cluster architecture with automated fail-over. For details see JS7 - System Architecture.
  • Communication: Components communicate asynchronously, practically this means that any component can be shut down or can be subject to an outage without breaking the availability of any other component. Components reconcile after restart and synchronize state information to catch-up with latest processing results. For details see JS7 - Implementation Architecture.
  • Programming: The programming model is based on the handling of asynchronous events that are raised for state transitions. For details see JS7 - REST Web Service API.

Sharing of Duties

Each component is assigned a specific duty:

  • The JOC Cockpit is used to manage the inventory of workflows, jobs and related objects. In addition JOC Cockpit is used to monitor and to control workflow execution by other components.
    • An outage of JOC Cockpit does not impact workflow execution by the Controller and by Agents.
    • An outage of JOC Cockpit simply means that users are unaware about what workflows are currently executed but it does not mean that workflows would not run.
    • Any results about workflow execution are reported later on by a Controller when JOC Cockpit becomes available.
  • The Controller orchestrates Agents and forwards e.g. JS7 - Workflows and the JS7 - Daily Plan to Agents.
    • If the Controller were not available then this does not affect availability of JOC Cockpit. 
    • For Agents the loss of a connection from the Controller means that they cannot immediately report back execution results. However, Agents will continue to execute workflows that are within their reach and will store the information about JS7 - Order State Transitions and log output created by jobs with their journal for later forwarding to a Controller.
    • The exception to this rule are workflows that are implementing cross-platform scheduling, i.e. executing jobs within the same workflow on different Agents. In this situation an Agent can proceed with a workflow to that nodes only that are assigned the current Agent.
  • The Agents execute JS7 - Workflow Instructions as long as the instruction - including to execute any jobs - are assigned the current Agent.
    • Agents expect Controllers to establish a connection and will respond to connection requests but cannot actively establish a connection to a Controller.
    • Agents receive Workflow configurations and the Daily Plan from a Controller and know when to run orders. Agents therefore work semi-autonomously within the limits of being assigned the respective workflow instructions.

Cluster Architecture

Redundancy is provided by clustering the components as follows:

  • The JOC Cockpit can be operated for an active-passive cluster with one active instance and any number of passive instances.
    • Fail-over is handled automatically between cluster members by use of the JS7 - Cluster Service.
    • The JOC Cockpit cluster relies on a persistence layer provided by the JS7 - Database.
  • The Controller implements an active-passive cluster with one active instance and one passive instance.
    • The Controller implements clustering and journaling by its own and does not require additional components such as a DBMS.
    • Cluster members couple and synchronize automatically.
  • The Agent offers both an active-passive cluster and an active-active cluster.

Communication

Asynchronous communication is based on the fact that messages are sent to a partner component without relying on the availability of the given component: neither is guaranteed that a message was received by the recipient nor can be assumed that the recipient will be able to respond in good time.

  • If the communication between components breaks, e.g. due to a connection loss or network issue, then the calling component will repeatedly try to reconnect to the partner component. This mechanism works for the duration of an outage, for minutes, hours or days.
  • If messages cannot be forwarded then they are stored in memory for later retries:
    • if the calling component is restarted then messages about status information requests are lost.
    • in case of status change requests such messages are persistently stored.
  • Therefore it makes no sense to restart a calling component if the partner component is not available. The mantra to "restart the Windows server" does not apply to JS7 except that you had good reason to assume that a connection loss is due to issues with system resources.

Programming Model

The programming model includes to handle asynchronous events that are passed between components:  

  • The Controller and Agent raise events for state transitions of orders.
  • The Controller subscribes to events that originate from Agents. JOC Cockpit subscribes to events that are forwarded from a Controller.
  • The asynchronous nature of events is handled by the receiving component. Any events remain in place with the originating component until the receiving component confirms receipt. Only then events are released from the originating component.



  • No labels