GC in Elasticsearch

Basic information about garbage collection (GC) in Elasticsearch, including the default garbage collector used, JVM options, GC logging, and more.

Introduction

This article explains the basic information about garbage collection (GC) in Elasticsearch, including the default GC type, JVM options, GC logging, and more. The goal of this article is to help you better operate your Elasticsearch cluster by knowing how to observe the GC behavior and understand the right place to change the settings. This article covers the settings of Elasticsearch 6 and Elasticsearch 7. Now, let’s get started!

GC Types

Elasticsearch mainly uses two different garbage collectors of Java: Concurrent Mark Sweep (CMS) Collector and Garbage-First (G1) Garbage Collector. The type of GC chosen does not depend on the version of Elasticsearch but rather the version of JDK. When using any JDK version between 8 and 13 (included), the default GC used is Concurrent Mark Sweep GC; when using JDK version 14 or later, the default GC used in G1 GC. This is the case for the latest version of Elasticsearch 6.x and Elasticsearch 7.x.

JDK GC
8 - 13 Concurrent Mark Sweep (CMS) Collector
14+ Garbage-First (G1) Garbage Collector

You can check this information in the file jvm.options (6.8 / 7.9). For example, here is the excerpt from v7.9.0:

## GC configuration
8-13:-XX:+UseConcMarkSweepGC
8-13:-XX:CMSInitiatingOccupancyFraction=75
8-13:-XX:+UseCMSInitiatingOccupancyOnly

## G1GC Configuration
# NOTE: G1 GC is only supported on JDK version 10 or later
# to use G1GC, uncomment the next two lines and update the version on the
# following three lines to your version of the JDK
# 10-13:-XX:-UseConcMarkSweepGC
# 10-13:-XX:-UseCMSInitiatingOccupancyOnly
14-:-XX:+UseG1GC
14-:-XX:G1ReservePercent=25
14-:-XX:InitiatingHeapOccupancyPercent=30

According to Oracle JDK Documentation, the Concurrent Mark Sweep (CMS) collector is designed for applications that prefer shorter garbage collection pauses and that can afford to share processor resources with the garbage collector while the application is running. Typically applications that have a relatively large set of long-lived data (a large old generation) and run on machines with two or more processors tend to benefit from the use of this collector. ⚠️ Note that the CMS collector had been deprecated since Java 9. The CMS collector is enabled with the command-line option below:

-XX:+UseConcMarkSweepGC

According to Oracle JDK Documentation, the Garbage-First (G1) garbage collector is targeted for multiprocessor machines with a large amount of memory. It attempts to meet garbage collection pause-time goals with high probability while achieving high throughput with little need for configuration. G1 aims to provide the best balance between latency and throughput using current target applications and environments. The G1 GC is the default collector for Java 9+, so you don’t have to perform any additional actions. But you can explicitly enable it by providing the following command-line:

-XX:+UseG1GC

Logging

By default, GC logs are enabled in Elasticsearch. The settings are configured in jvm.options and the logs are written in the same location as other Elasticsearch logs. The default configuration rotates the logs every 64 MB and can consumer up to 2 GB of disk space. For more information about GC logging, you can visit the official documentation of Elasticsearch: GC logging.

Internally, Elasticsearch has a JVM GC Monitor Service (JvmGcMonitorService) (6.8 / 7.9) which monitors the GC problem smartly. This service logs the GC activity if some GC problems were detected. According to the severity, the logs will be written at different levels (DEBUG/INFO/WARN). In Elasticsearch 6.x and Elasticsearch 7.x, two GC problems are logged: GC slowness and GC overhead. GC slowness means the GC takes too long to execute. GC overhead means the GC activity exceeds a certain percentage in a fraction of time.

GC Problem Debug Info Warning
Slow GC (Young) 400ms+ 700ms+ 1,000ms+
Slow GC (Old) 2,000ms+ 5,000ms+ 10,000ms+
GC Overhead 10%+ 25%+ 50%+

But these sound a bit abstract… Don’t worry, let’s take two concrete logs to see how do they look like in real:

”[gc][young][69127][5329] duration [758ms], collections [1]/[1s], total [758ms]/[4.2h], memory [3.2gb]->[3.4gb]/[7.7gb], all_pools {…}”

The message above logs the “Slow GC” problem: the GC of the young generation was logged in the sequence 69127 as the Nº5329 garbage collection. It took 758ms to complete. In the current round, GC happened once over the last second (1s). In total, the GC collection time is 4.2 hours. The heap changed from 3.2GB used to 3.4GB used after the GC. The max GC used was 7.7GB. Then, the message continues with some detailed statistics about the JVM.

”[gc][1234] overhead, spent [287ms] collecting in the last [1s]“

The message above logs the “GC Overhead” problem. In the sequence 1234, we observed that the GC was overhead, the Elasticsearch node spent 287ms doing garbage collection in the last 1 second, which represents 28.7% of the activity.

Default GC Options

If you want to know the default GC options in Java where your Elasticsearch node is running, you can use the command below to list them:

java -XX:+PrintFlagsFinal -version

For example, finding the default value of MaxGCPauseMillis can be done as follow:

~ $ java -XX:+PrintFlagsFinal -version | grep MaxGCPauseMillis
    uintx MaxGCPauseMillis                         = 200                                       {product} {default}
openjdk version "14.0.2" 2020-07-14
OpenJDK Runtime Environment AdoptOpenJDK (build 14.0.2+12)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 14.0.2+12, mixed mode, sharing)

Changing GC Options

If you want to tune the garbage collector settings, you need to change the GC options. Normally you don’t need to do that and the problem is probably from elsewhere. Elasticsearch warns you about this in the jvm.options file: “All the (GC) settings below are considered expert settings. Don’t tamper with them unless you understand what you are doing.”. But you really want to do that, you can follow the documentation “Setting JVM options” to change the GC options. Depending on the distribution you used (tar/zip, Debian/RPM package, Docker), there are different ways to change the options. Roughly speaking, you can change them by: 1) overriding JVM options via JVM options files either from config/jvm.options or config/jvm.options.d/; 2) settings the JVM options via the ES_JAVA_OPTS environment variable. Please read that document for more detail.

Going Further

How to go further from here?

Conclusion

In this article, we discussed the garbage collector (GC) usage in Elasticsearch by going through the default GC used by Elasticsearch on different JDK versions, the smart GC logs when GC problems happen (slowness, overhead), the command line to print the GC options, how to change GC options, and how to go further from here. Interested to know more? You can subscribe to the feed of my blog, follow me on Twitter or GitHub. Hope you enjoy this article, see you the next time!

References