Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Introduction

  • Initial Operation is performed after installation of JS7 Controller, Agent and JOC Cockpit, see JS7 - Initial Operation
  • In case there are issues while registering the Controllers then the Controller cannot orchestrate the job-related configuration to Agents. Also, the Controller is used for the execution of workflows that include jobs running on a number of Agents as switching of Agents during workflow execution is performed by the customer.

Troubleshooting

After registering the Controller its status can be checked from the JS7 - Dashboard. When registering a Controller a number of misconfigurations can occur.

Operations to register a Controller are performed from the JOC Cockpit page "Manage Controllers/Agents" that is available from the user menu in the right upper corner of the GUI. Administrative permissions are required to be able to see and to use this page.

User Errors

Same Controller is used as a Standalone Controller and as a member of a Controller Cluster

  • Problem: Assume that the JOC Cockpit uses a Controller registered as a Standalone Controller. If the same Controller is registered as a Controller Cluster member then the JOC Cockpit throws an error such as
    • JocObjectAlradyExistException: com.sos.joc.exceptions.JocObjectAlreadyExistException: Controller(s) with id 'controller' already exists 
    • This error message is available from the JOC Cockpit's log file too, e.g. from the JS7_JOC_DATA/logs/joc.log file, see JS7 - Log Files and Locations.
  • SolutionIt is not possible to register the same Controller twice, as a Standalone Controller and as a Controller cluster.

...

  • Check that you have a license key ready as the clustering for Controllers is available for a commercial license, see JS7 - License.
  • Verify that execution of workflows is completed with the Standalone Controller and that no orders are running.
  • Remove the Standalone Controller from the JOC Cockpit GUI.
  • Shutdown the Standalone Controller and remove the Controller's journal files in its JS7_OONTROLLER_DATA/state directory.
  • Follow the steps from the JS7 - Initial Operation for Controller Cluster article.
  • Redeploy the scheduling objects such as workflows from JOC Cockpit to the Controller Cluster.

Controller Instances with different Controller IDs are used as a Controller Cluster

  • Problem: If a Controller Cluster is registered with its Controller instances using different Controller IDs then this will raise an error:
    • as joc.log ControllerInvalidResponseDataException:  
      com.sos.joc.exceptions.ControllerInvalidResponseDataException: The cluster members must have the same Controller Id: http://<host1>:<port1> -> controller_ID1, HTTP://<host2>:<port2> -> controller_ID2.
    • This error message is available from the JOC Cockpit's log file too, e.g. from the JS7_JOC_DATA/logs/joc.log file, see JS7 - Log Files and Locations.
  • SolutionIt is not possible to register Controller instances with different Controller IDs to work as a Controller Cluster. Instead check which Controller instance uses the wrong Controller ID and rerun the installation. During installation the Controller ID can be specified, see JS7 - Controller Installation On Premises and JS7 - Controller Installation for Docker Containers.

Secondary Controller instance is not configured for use with a Cluster

  • Problem: If the Secondary Controller instance is not configured for use with a Cluster but runs as a standalone instance then the Primary Controller instance's logs include error messages like this:
    • 2022-01-27T00:00:08,669 WARN js7.cluster.ClusterCommon - 'ClusterStartBackupNode' command failed with HTTP 400 Bad Request: POST http://example.com:4444/controller/api/command => ClusterNodeIsNotBackup: The cluster node to be appointed is not configured as a backup node
    • This error message is available from the Controller instance's log file, for example from the JS7_CONTROLLER_DATA/logs/controller.log file, see JS7 - Log Files and Locations.
  • Solution: The Secondary Controller is missing the following setting in its JS7_CONTROLLER_DATA/config/controller.conf file, see JS7 - Initial Operation for Controller Cluster, chapter: Check Cluster Settings
    • js7.journal.cluster.node.is-backup=yes
    • Due to the missing setting the Secondary Controller now is acting as a standalone instance which is a different operating mode. To initialize the Secondary Controller instance for cluster operation apply the following steps:
      • Shutdown the Secondary Controller instance.
      • Delete the contents of the Secondary Controller instance's JS7_CONTROLLER_DATA/state directory.
      • Start the Secondary Controller instance.
      • In the JOC Cockpit GUI navigate to the User Menu->Manage Controllers/Agents page. Edit the Controller entry, check the connection status from the available buttons and submit. As a result JOC Cockpit forwards this information to both Controller instances.
      • Should the Controller instances not be coupled after approx. 120s then
        • Shut down the Primary Controller instance.
        • Delete the contents of the Secondary Controller instance's JS7_CONTROLLER_DATA/state directory.
        • Shut down the Cluster Watch Agent.
        • Delete the contents of the Cluster Watch Agent's JS7_AGENT_DATA/state directory.
        • Start the Cluster Watch Agent.
        • Start the Primary Controller instance.
        • Coupling of Controller instances should occur within 60s.

License missing when configuring the Controller Cluster

Further Information

For troubleshooting during ongoing operation see JS7 - How to troubleshoot Controller journals

...