Problem

In a given situation one or more jobs report the following error in the task log:

Example for Java error message
java.lang.OutOfMemoryError: unable to create new native thread


This error occurs for JVM jobs, for example JS7 - Job Templates that are implemented using Java.

When this error occurs then the JVM did crash. Most probably a Java Fatal Error Log was created with the file name hs_err_pid<pid>.log, where pid is the process ID of the task. Fatal Error Logs are created in the JS7 Agent's working directory,

  • By default the Agent's data directory is used that holds sub-directories for configuration files, log files and journal files. The directory is available from the JS7_AGENT_DATA environment variable.
  • Users can specify the working directory from the Agent's Instance Start Script using the JS7_AGENT_WORK_DIR environment variable, for details see JS7 - Agent Command Line Operation


Example for Java Fatal Error Log
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Cannot create worker GC thread. Out of system resources.
# Possible reasons:
#   The system is out of physical RAM or swap space
#   The process is running with CompressedOops enabled, and the Java Heap may be blocking the growth of the native heap
# Possible solutions:
#   Reduce memory load on the system
#   Increase physical memory or swap space
#   Check if swap backing store is full
#   Decrease Java heap size (-Xmx/-Xms)
#   Decrease number of Java threads
#   Decrease Java thread stack sizes (-Xss)
#   Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
#
#  Out of Memory Error (workgroup.cpp:99), pid=5778, tid=0x00007f1580f1b740
#
# JRE version:  (8.0_322-b06) (build )
# Java VM: OpenJDK 64-Bit Server VM (25.322-b06 mixed mode linux-amd64 compressed oops)
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#

---------------  T H R E A D  ---------------

Current thread (0x0000000000c96000):  JavaThread "Unknown thread" [_thread_in_vm, id=5778, stack(0x00007ffc11968000,0x00007ffc11a68000)]

Stack: [0x00007ffc11968000,0x00007ffc11a68000],  sp=0x00007ffc11a642b0,  free space=1008k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xb65c5d]  VMError::report_and_die()+0x1cd
V  [libjvm.so+0x5073ca]  report_vm_out_of_memory(char const*, int, unsigned long, VMErrorType, char const*)+0xaa
V  [libjvm.so+0xb82c77]  WorkGang::initialize_workers()+0x127
V  [libjvm.so+0x4d11ac]  ConcurrentMark::ConcurrentMark(G1CollectedHeap*, G1RegionToSpaceMapper*, G1RegionToSpaceMapper*)+0x76c
V  [libjvm.so+0x5b0b4e]  G1CollectedHeap::initialize()+0x7ae
V  [libjvm.so+0xb2eaa9]  Universe::initialize_heap()+0x159
V  [libjvm.so+0xb2ed52]  universe_init()+0x42
V  [libjvm.so+0x643dcf]  init_globals()+0x6f
V  [libjvm.so+0xb10f9f]  Threads::create_vm(JavaVMInitArgs*, bool*)+0x29f
V  [libjvm.so+0x71700d]  JNI_CreateJavaVM+0x5d
C  [libjobscheduler-engine.so+0x3e0d1b]  zschimmer::javabridge::Vm::start()+0x85b
C  [libjobscheduler-engine.so+0x25f5bd]  sos::scheduler::init_java_vm(zschimmer::javabridge::Vm*)+0x4d
C  [libjobscheduler-engine.so+0x1a13ba]  sos::start_java(std::string const&, std::string const&)+0xca
C  [libjobscheduler-engine.so+0x1bc3cb]  sos::spooler_main(int, char**, std::string const&, _jobject*)+0x110b
C  [libjobscheduler-engine.so+0x1bd94c]  sos::sos_main(int, char**)+0x9c
C  [libjobscheduler-engine.so+0x30c6b3]  sos::sos_main0(int, char**)+0x53
C  [libc.so.6+0x223d5]  __libc_start_main+0xf5

Analysis

The Java Fatal Error Log holds detailed information about occurrence of the error, for example on start-up of the JVM or later on. In addition the log will indicate the affected module or class.

  • Line 3: Cannot create worker GC thread. Out of system resources.
    • This explains that the thread that could not be created was launched by the Garbage Collector (GC).
    • This means that there are no thread errors of the application but errors occur when the GC tries to clean-up the heap space.
  • Line 34: V [libjvm.so+0x5b0b4e] G1CollectedHeap::initialize()+0x7ae
    • This confirms that the thread error occurs on initialization of the Garbage Collector.

Basically there are two prominent reasons for the error unable to create new native thread:

  • The system limit for the max. number of open file descriptors or processes is exceeded.
  • The system memory for the thread stack is exceeded.

Maximum Number of Open File Descriptors

The max. number of open file descriptors is specified with values for hard limits and soft limits that apply to the processes running for an individual account and for the processes running for all accounts.

JVM jobs can cause a larger number of open file descriptors. For example, each .jar file that is available with the JS7 Agent will count as an open file when loading the JVM. The number of open files can be verified from the following OS command:


Report the number of open file descriptors for all accounts
lsof | wc -l


Explanation:

  • The number reported can be quite high, values up to 500 000 are not uncommon for Unix systems.
  • Higher values, particularly exceeding 1 000 000 suggest further investigation.


Report the number of open file descriptors for the JS7 Agent's account
lsof -u jobscheduler | wc -l


Explanation:

  • The example assumes the jobscheduler account to be used by the JS7 Agent.
  • If more than one JS7 product is running on the same machine with the same account then open file descriptors will be reported for all JS7 product instances using the same account.
  • Values < 10 000 are considered normal for small environments with low parallelism of jobs.
  • Values < 50 000 are considered normal for medium environments running for example 200 jobs in parallel on the same Agent.

Thread Stack Size

Java memory includes both heap memory (for Java objects and classes) and stack memory (for Java primitive values, references and threads).

  • The size of the heap space is specified using the -Xms and -Xmx Java options, see JS7 - FAQ - Which Java Options are recommended.
  • The thread stack size is specified per thread using the -Xss Java option.
    • The default value is 1 MB. This value is fine for use with JS7 Controller and Agent.
    • The JS7 JOC Cockpit requires 4 MB thread stack size.

When running Java jobs users will observe that the memory consumed by a job exceeds the -Xmx value. On top comes the thread stack that is a multiple of the -Xss value calculated from the number of threads that is specific for a given job.

Java memory cannot be increased or deallocated at run-time.

  • The heap space requires continuous garbage collection to make released memory available to the Java application.
  • The thread stack is a last-in-first-out stack that automatically increases and decreases when threads are created and terminated. Therefore the thread stack is not subject to garbage collection.

If the thread stack size is exceeded by an individual thread then Java will raise the java.lang.StackOverFlowError exception. In this situation users can increase the thread stack size by use of the -Xss Java option.

If a thread cannot be created then Java will throw the java.lang.OutOfMemoryError exception. This is not related to the thread stack size but to memory exhaustion or system limits that impose the maximum number of file descriptors and processes.

Solution

For Unix environments the hard and soft limits for the maximum number of open file descriptors frequently are specified from the /etc/security/limits.conf file like this:

Example for /etc/security/limits.conf file for Unix
#<domain>      <type>  <item>         <value>
#

#*               soft    core            0
#*               hard    rss             10000
#@student        hard    nproc           20
#@faculty        soft    nproc           20
#@faculty        hard    nproc           50
#ftp             hard    nproc           0
#@student        -       maxlogins       4

jobscheduler     hard    nofile          50000
jobscheduler     soft    nofile          50000

jobscheduler     hard    nproc           30000
jobscheduler     soft    nproc           30000


Explanation:

  • The nofile setting specifies the max. number of open files.
  • The nproc setting specifies the max. number of processes Users should consider that threads are mapped to processes. 
  • Depending on the Unix OS and distribution default values of 10000 for nofile and nproc can be found.
  • Default values tend to be too small if a larger number of Java JVM Jobs is running in parallel, for example >100 jobs. Users can increase values for nofile and nproc settings. There is no harm about resource consumption. It's a limit, not a reservation of resources.
  • Values can be adjusted for individual OS accounts.
  • Changes to the limits.conf file will be considered immediately for any processes that are starting. This means that it is not required to restart a JS7 Agent in order to apply changes to settings as they will automatically be applied to next jobs starting.



  • No labels