Introduction

The JS7 Controller, Agents and JOC Cockpit make use of Unicode. When files are created or used then UTF-8 encoding applies.

  • Unicode support works across any supported JS7 - Platforms.
  • Limitations have to be considered in mixed environments that include Windows operating systems that ship without Unicode support and use a two byte subset UTF-16 LE.

JOC Cockpit requires the JS7 - Database to support UTF-8 encoding in order to store objects that hold such characters.

  • For UTF-8 a single character stored to a JS7 database table requires 1 to 4 bytes.
  • For example, NVARCHAR(10) specifies a column width of 10 characters that can consume 40 bytes.

The JDBC Drivers in use have to support Unicode.

JOC Cockpit

When designing workflows with the Configuration -> Inventory view users can make use of any Unicode characters within the scope of JS7 - Object Naming Rules.

Configuration View

The Configuration -> Inventory view allows object names and job scripts to be specified using Unicode characters:

Workflows View

The Workflows view displays workflows using Unicode characters accordingly:

  • The example below makes use of Japanese object names.
  • In addition, the JOC Cockpit interface language has been switched to Japanese by use of JS7 - Profiles - Preferences.

Log View

The Log view window displays output of jobs and instructions.

  • Output is partly created from the JOC Cockpit and is qualified with markers such as [MAIN], [SUCCESS], [DETAIL].
  • Output is partly created from job scripts executed with an OS shell. Such output is marked as [STDOUT] and [STDERR].


Explanation:

  • Output from Unix Operating Systems
    • Unix Operating Systems ship with built-in support for Unicode and UTF-8 encoding.
  • Output from Windows Operating Systems
    • Windows does not offer Unicode. Instead the OS ships with different code pages preinstalled depending on the location in which the OS is used.
    • Some experimental Unicode support is available starting from Windows 10, however, as most Windows programs are not aware of Unicode there can be side-effects.
    • Therefore the encoding of output created by jobs depends on the code page in use for the Windows OS which an Agent is operated on.

Controller

The Controller does not read or write files related to execution of workflows and jobs. The Controller reads configuration files and writes component log files only.

  • Configuration files use UTF-8 encoding.
  • Component log files are created with UTF-8 encoding.

Agent

Agents use an OS shell to execute jobs scripts. Agents collect output of jobs that is available from the stdout and stderr channels.

  • For Unix environments the OS creates output in UTF-8 encoding.
  • For Windows environments the OS makes use of a code page to encode output.

Use with Windows Code Pages

The Agent makes use of the code page that is active for the computer the Agent is operated on.

  • In Asia code page 65001 or specific code pages such as 932 for Japan are frequently used.
  • In Western Europe code page 850 is frequently used.

FEATURE AVAILABILITY STARTING FROM RELEASE 2.3.0

Supported Code Pages

The Agent automatically detects and makes use of the following code pages:

List of supported code pages
js7.windows.codepages {
   37 = "IBM037"          # IBM EBCDIC US-Canada
  437 = "IBM437"          # OEM United States
  737 = "x-IBM737"        # ibm737  OEM Greek (formerly 437G); Greek (DOS)
  775 = "IBM775"          # ibm775  OEM Baltic; Baltic (DOS)
  850 = "IBM850"          # ibm850  OEM Multilingual Latin 1; Western European (DOS)
  852 = "IBM852"          # ibm852  OEM Latin 2; Central European (DOS)
  855 = "IBM855"          # OEM Cyrillic (primarily Russian)
  857 = "IBM857"          # ibm857  OEM Turkish; Turkish (DOS)
  858 = "IBM00858"        # OEM Multilingual Latin 1 + Euro symbol
  860 = "IBM860"          # OEM Portuguese; Portuguese (DOS)
  861 = "IBM861"          # ibm861  OEM Icelandic; Icelandic (DOS)
  862 = "IBM862"          # DOS-862  OEM Hebrew; Hebrew (DOS)
  863 = "IBM863"          # OEM French Canadian; French Canadian (DOS)
  864 = "IBM864"          # OEM Arabic; Arabic (864)
  865 = "IBM865"          # OEM Nordic; Nordic (DOS)
  866 = "IBM866"          # cp866  OEM Russian; Cyrillic (DOS)
  869 = "IBM869"          # ibm869  OEM Modern Greek; Greek, Modern (DOS)
  870 = "IBM870"          # IBM EBCDIC Multilingual/ROECE (Latin 2); IBM EBCDIC Multilingual Latin 2
  874 = "x-IBM874"        # windows-874  Thai (Windows)
  875 = "x-IBM875"        # cp875  IBM EBCDIC Greek Modern
  932 = "shift_jis"       # shift_jis ANSI/OEM Japanese; Japanese (Shift-JIS)
  949 = "x-windows-949"   # ks_c_5601-1987  ANSI/OEM Korean (Unified Hangul Code)
  950 = "x-windows-950"   # big5  ANSI/OEM Traditional Chinese (Taiwan; Hong Kong SAR, PRC); Chinese Traditional (Big5)
  1026 = "IBM1026"        # IBM EBCDIC Turkish (Latin 5)
  1047 = "IBM1047"        # IBM EBCDIC Latin 1/Open System
  1140 = "IBM01140"       # IBM EBCDIC US-Canada (037 + Euro symbol); IBM EBCDIC (US-Canada-Euro)
  1141 = "IBM01141"       # IBM EBCDIC Germany (20273 + Euro symbol); IBM EBCDIC (Germany-Euro)
  1142 = "IBM01142"       # IBM EBCDIC Denmark-Norway (20277 + Euro symbol); IBM EBCDIC (Denmark-Norway-Euro)
  1143 = "IBM01143"       # IBM EBCDIC Finland-Sweden (20278 + Euro symbol); IBM EBCDIC (Finland-Sweden-Euro)
  1144 = "IBM01144"       # IBM EBCDIC Italy (20280 + Euro symbol); IBM EBCDIC (Italy-Euro)
  1145 = "IBM01145"       # IBM EBCDIC Latin America-Spain (20284 + Euro symbol); IBM EBCDIC (Spain-Euro)
  1146 = "IBM01146"       # IBM EBCDIC United Kingdom (20285 + Euro symbol); IBM EBCDIC (UK-Euro)
  1147 = "IBM01147"       # IBM EBCDIC France (20297 + Euro symbol); IBM EBCDIC (France-Euro)
  1148 = "IBM01148"       # IBM EBCDIC International (500 + Euro symbol); IBM EBCDIC (International-Euro)
  1149 = "IBM01149"       # IBM EBCDIC Icelandic (20871 + Euro symbol); IBM EBCDIC (Icelandic-Euro)
  1200 = "UTF-16LE"       # utf-16  Unicode UTF-16, little endian byte order (BMP of ISO 10646); available only to managed applications
  1201 = "UTF-16BE"       # unicodeFFFE  Unicode UTF-16, big endian byte order; available only to managed applications
  1250 = "windows-1250"   # windows-1250  ANSI Central European; Central European (Windows)
  1251 = "windows-1251"   # windows-1251  ANSI Cyrillic; Cyrillic (Windows)
  1252 = "windows-1252"   # windows-1252  ANSI Latin 1; Western European (Windows)
  1253 = "windows-1253"   # windows-1253  ANSI Greek; Greek (Windows)
  1254 = "windows-1254"   # windows-1254  ANSI Turkish; Turkish (Windows)
  1255 = "windows-1255"   # windows-1255  ANSI Hebrew; Hebrew (Windows)
  1256 = "windows-1256"   # windows-1256  ANSI Arabic; Arabic (Windows)
  1257 = "windows-1257"   # windows-1257  ANSI Baltic; Baltic (Windows)
  1258 = "windows-1258"   # windows-1258  ANSI/OEM Vietnamese; Vietnamese (Windows)
  12000 = "UTF-32LE"      # utf-32  Unicode UTF-32, little endian byte order; available only to managed applications
  12001 = "UTF-32BE"      # utf-32BE  Unicode UTF-32, big endian byte order; available only to managed applications
  20127 = "US-ASCII"      # us-ascii  US-ASCII (7-bit)
  20866 = "KOI8-R"        # koi8-r  Russian (KOI8-R); Cyrillic (KOI8-R)
  20905 = "KOI8-U"        # IBM EBCDIC Turkish
  20932 = "EUC-JP"        # EUC-JP  Japanese (JIS 0208-1990 and 0212-1990)
  21025 = "x-IBM1025"     # cp1025  IBM EBCDIC Cyrillic Serbian-Bulgarian
  28591 = "iso-8859-1"    # iso-8859-1  ISO 8859-1 Latin 1; Western European (ISO)
  28592 = "iso-8859-2"    # iso-8859-2  ISO 8859-2 Central European; Central European (ISO)
  28593 = "iso-8859-3"    # iso-8859-3  ISO 8859-3 Latin 3
  28594 = "iso-8859-4"    # iso-8859-4  ISO 8859-4 Baltic
  28595 = "iso-8859-5"    # iso-8859-5  ISO 8859-5 Cyrillic
  28596 = "iso-8859-6"    # iso-8859-6  ISO 8859-6 Arabic
  28597 = "iso-8859-7"    # iso-8859-7  ISO 8859-7 Greek
  28598 = "iso-8859-8"    # iso-8859-8  ISO 8859-8 Hebrew; Hebrew (ISO-Visual)
  28599 = "iso-8859-9"    # iso-8859-9  ISO 8859-9 Turkish
  28603 = "iso-8859-13"   # iso-8859-13  ISO 8859-13 Estonian
  28605 = "iso-8859-15"   # iso-8859-15  ISO 8859-15 Latin 9
  50220 = "iso-2022-jp"   # iso-2022-jp  ISO 2022 Japanese with no halfwidth Katakana; Japanese (JIS)
  50222 = "iso-2022-jp"   # iso-2022-jp  ISO 2022 Japanese JIS X 0201-1989; Japanese (JIS-Allow 1 byte Kana - SO/SI)
  50225 = "iso-2022-kr"   # iso-2022-kr  ISO 2022 Korean
  65001 = "UTF-8"         # utf-8  Unicode (UTF-8)
}

Specifying the Code Page

The Agent detects and makes use of the code page used by the Windows OS

  • for code pages from the above list,
  • for code pages that are prefixed with cp# or CP# with # being the number of the code page. 

Users can enforce use of a supported code page by adding a setting to the Agent's JS7_AGENT_CONFIG_DIR/agent.conf configuration file such as:

Setting to specify the code page from the agent.conf file
js7.job.execution.encoding = "UTF-8"


Explanation:

  • This setting specifies the MIME type not the numeric code page identifier, for example UTF-8 instead of 65001.
  • Users should be aware that modifying the Agent's code page will not modify the code page of the underlying OS shell.
  • OS commands will continue to encode output with the OS code page.

Examples for Windows Code Pages

The following examples explain use of Windows code pages.

Download (upload .json): pdLanguageSupportSwitchCodePage.workflow.json

Workflow Jobs:  ハローワールドジョブ

Example for the first job in the workflow
@echo off

echo running job: %JOB_NAME%
echo 日本語

echo var1=日本語 >> %JS7_RETURN_VALUES%
echo last_job_name=%JOB_NAME% >> %JS7_RETURN_VALUES%

chcp


Explanation:

  • The first job is assigned an environment variable JOB_NAME from a built-in global variable like this:



  • Line 3: The output of the JOB_NAME environment variable can be scrambled if the server does not use a compatible code page, see chapter Log View.
  • Line 4: The job displays Unicode characters that have been directly added to the job script.
  • Line 6, 7: The job creates two variables for use with later jobs:
    • var1: the variable directly holds Unicode characters.
    • last_job_name: the variable holds the name of the current job.
  • Line 9: The job displays the current code page.


Example for the second job in the workflow
@cmd.exe /K chcp 65001

<nul     set /p "echo=running job %JOB_NAME%"
<nul     set /p "echo=last job name: %LAST_JOB_NAME%"
<nul     set /p "echo=value of variable var1: %VAR1%"
<nul     set /p "echo=日本語"

chcp


Explanation:

  • Line 1: The job script creates a new shell and switches to a Unicode code page.
  • Line 3: The output created for the current job name is readable from the log view as the code page now includes Unicode support.
  • Line 4: The output of the last_job_name variable will be scrambled if the predecessor job created this variable from a non-matching code page.
  • Line 5, 6:: The output is readable in the log.

Log View



  • No labels