Thursday 13 September 2018

HMGR0152W: CPU Starvation detected messages in SystemOut.log

Problem(Abstract)

New system is working properly but HMGR warning messages are being logged in the SystemOut.log file.

Symptom

[10/25/05 16:42:27:635 EDT] 0000047a CoordinatorCo W HMGR0152W: CPU Starvation detected. Current thread scheduling delay is 9 seconds.

Cause

The HMGR0152W message is an indication that JVM thread scheduling delays are occurring for this process.

The WebSphere® Application Server high availability manager component contains thread scheduling delay detection logic, that periodically schedules a thread to run and tracks whether the thread was dispatched and run as scheduled. By default, a delay detection thread is scheduled to run every 30 seconds, and will log a HMGR0152W message if it is not run within 5 seconds of the expected schedule. The message will indicate the delay time or time differential between when the thread was expected to get the CPU, and when the thread actually got CPU cycles.

The HMGR0152W message can occur even when plenty of CPU resource is available. There are a number of reasons why the scheduled thread might not have been able to get the CPU in a timely fashion. Some common causes include the following:
  • The physical memory is overcommitted and paging is occurring.
  • The heap size for the process is too small causing garbage collection to run too frequently and/or too long, blocking execution of other threads.
  • There might simply be too many threads running in the system, and too much load placed on the machine, which might be indicated by high CPU utilization.


Resolving the problem

The HMGR0152W message is attempting to warn you that a condition is occurring that might lead to instability if it is not corrected. Analysis should be performed to understand why the thread scheduling delays are occurring, and what action(s) should be taken. Some common solutions include the following:
  • Adding more physical memory to prevent paging.
  • Tuning the JVM memory (heap size) for optimal garbage collection.
  • Reducing the overall system load to an acceptable value.

If the HMGR0152W messages do not occur very often, and indicate that the thread scheduling delay is relatively short (for example, < 20 seconds), it is likely that no other errors will occur and the message can safely be ignored.

The high availability manager thread scheduling delay detection is configurable by setting either of the following 2 custom properties.
  • IBM_CS_THREAD_SCHED_DETECT_PERIOD determines how often a delay detection thread is scheduled to run. The default value of this parameter is 30 (seconds).
  • IBM_CS_THREAD_SCHED_DETECT_ERROR determines how long of a delay should be tolerated before a warning message is logged. By default this value is 5 (seconds).

These properties are scoped to a core group and can be configured as follows:
  1. In the administrative console, click Servers > Core groups > Core groups settings and then select the core group name.

  2. Under Additional Properties, click Custom properties > New.

  3. Enter the property name and desired value.

  4. Save the changes.

  5. Restart the server for these changes to take effect.

While it is possible to use the custom properties mentioned above to increase the thread-scheduling-detect-period until the HMGR0152W warning messages no longer occur, this is not recommended. The proper solution is to tune the system to eliminate the thread scheduling delays.

Related information

MustGather: High CPU issues
Tuning operating systems
Tuning the Application Server Environment

CPU is starvated: How to feed my CPU.

Scarlet O'Hara once said "I'm going to live through this and when it's all over, I'll never be hungry again."  That was a story and era before computers. These days our computers can become starved, at least the Java (tm) virtual machine (JVM) can.  Performance is a key concern for everyone. When users have to wait, they are discouraged and either become distracted or they go somewhere else. Keeping a system running smoothly is key. Every now and then systems will have a slow spot. However, when it continuously impacts users, this issue must be investigated. For this article, I am focusing on the following example outputs seen in an JVM SystemOut.log.  These examples came from an IBM Business Process Manager SystemOut.log file.

HMGR0152W: CPU Starvation detected. Current thread scheduling delay is 23 seconds.
DCSV0004W: DCS Stack DefaultCoreGroup at Member PCCell01\PCNode01\BPM751PDEV.AppTarget.PCMNode01.0: Did not receive adequate CPU time slice. Last known CPU usage time at 12:23:55:452 CST. Inactivity duration was 31 seconds.

What does CPU Starvation mean?
CPU Starvation means that the JVM had to wait for processing time! Some other process took 100% of the CPU and the JVM did not work. Twenty-three seconds is a long time for a server to wait. In some examples, I have seen the wait time as high as 70 seconds.

Where is all the CPU time going?
There are two places to look. One is on the system itself. Is there a process on the operating system that has run away and is running at 99%?  A simple top command combined with kill -3 command on Linux operating systems or the Task Manger (2) on Windows operating systems can help. If there is another application on the server that became hung and is taking all the CPU time, investigate and stop the process.

If the operating system does not have any extra processes and you see CPU starvation, most likely the server is a guest operating system on a virtual environment. What this means is the larger virtual infrastructure does not have enough CPU time to give all of the virtual machines it controls. Contact your virtual machine provider or internal sysops team to start investigating the overall health of the virtual system. Other virtual machines in the environment might be using the system heavily and need to move to a different server.  Another option would be to dedicate CPU usage rather than sharing, which is default.  We have a document that offers links to other documents to consider when you are running J2EE applications and databases in a virtual environment.


Source: https://www.ibm.com/developerworks/community/blogs/WebSphere_Process_Server/entry/hungry_cpu?lang=en

No comments:

Post a Comment