Discovery in Elasticsearch - Mincong Huang

Introduction

Discovery is an important topic when running an Elasticsearch cluster on production. Discovering nodes within the cluster or running a master election, these are the two main tasks of the discovery module. The goal of this article is to share the basic concepts about the discovery in Elasticsearch 6 so that you can better configure your cluster or better handle operations about it.

After reading this article, you will understand:

The different mechanisms of discovery
Key settings about discovery
Fault detection
Logs about discovery
Discovery in Elasticsearch 7
How to go further on this topic?

Note that this article is mainly focused on Elasticsearch 6. If you need to know about the discovery in Elasticsearch 7, please jump to the “Discovery in Elasticsearch 7” section or read the official documentation of Elasticsearch directly. Now, let’s get started!

Multicast Discovery

Multicast discovery does not exist anymore. It was only available as a plugin from Elasticsearch 2.0 onwards, and that plugin was removed in Elasticsearch 5.0, according to Clinton Gormley on this comment.

Unicast Discovery

Unicast discovery configures a static list of hosts for use as seed nodes. These hosts can be specified as hostnames or IP addresses; hosts specified as hostnames are resolved to IP addresses during each round of pinging. Here is an example of a unicast configuration inside the Elasticsearch configuration file (elasticsearch.yml):

discovery.zen.ping.unicast.hosts: ["10.0.0.3:9300", "10.0.0.4:9300", "10.0.0.5:9300"]

But what is the location of elasticsearch.yml? If you’re using the Elasticsearch Docker image, the default absolute path of the configuration file is /usr/share/elasticsearch/config/elasticsearch.yml (reference). You can also check directory $ES_HOME/config/ or $ES_PATH_CONF as mentioned in “Configuring Elasticsearch” (6.8).

Not all of the Elasticsearch nodes in the cluster need to be present in the unicast list to discover all the nodes, but enough addresses should be configured for each node to know about an available gossip node.

File-based Discovery

In addition to hosts provided by the setting discovery.zen.ping.unicast.hosts, you can provide a list of hosts via an external file. Elasticsearch can detect changes on this file and reload it so that the list of seed hosts can be changed dynamically without needing to restart a node. To enable the filed-based discovery, configure the file hosts provider as:

discovery.zen.hosts_provider: file

Then, you need to create a new file under the configuration directory of Elasticsearch as $ES_PATH_CONF/unicast_hosts.txt as the format below.

10.0.0.6:9300
10.0.0.7:9300

Combined with the values defined by discovery.zen.ping.unicast.hosts in the previous section, the final list of seed hosts will be 10.0.0.{2,3,4,5,6,7}:9300 because both the values defined by the unicast and defined by the file unicast_hosts.txt are used. For more detail, you can see the “File-based” section of Zen Discovery (6.8).

Discovery Settings

In Elasticsearch 6, there are two important settings for discovery, they should be configured before going to production:

discovery.zen.ping.unicast.hosts
discovery.zen.minimum_master_nodes

Let’s go into more detail about these settings.

discovery.zen.ping.unicast.hosts

This setting defines a static list of hosts for use as seed nodes for Zen discovery. These hosts can be specified as hostnames or IP addresses; hosts specified as hostnames are resolved to IP addresses during each round of pinging. Each value should be in the form of host:port or host. In other words, the port of the host is optional. If empty, the port defaults to the setting transport.profiles.default.port falling back to transport.port if not set. A hostname that resolves to multiple IP addresses will try all resolved addresses. You can also provide IPv6 addresses. Additionally, the discovery.zen.ping.unicast.resolve_timeout configures the amount of time to wait for DNS lookups on each round of pinging. This is specified as a time unit and defaults to 5s.

discovery.zen.minimum_master_nodes

This setting defines the minimum of master-eligible nodes to form a cluster. It is essential to form a cluster correctly and prevent data loss. Without this setting, the cluster may suffer from split-brain issues. A split-brain scenario is when a subset of your cluster (one or more nodes) loses communication to the master node and forms a new cluster. It means that two different Elasticsearch clusters are running independently of each other. To prevent this from happening, set the minimum of master nodes as half of the master-eligible nodes plus one, i.e the minimum value to be majority:

(master_eligible_nodes / 2) + 1

For example, if you have 3 master-eligible nodes, set the minimum of master nodes as 2 because “(3 / 2) + 1 = 2”.

discovery.zen.minimum_master_nodes: 2

Fault Detection

Fault detection ensures that master node and other nodes are connected and healthy so that a master election or a node removal does not need to be held. On one side, the elected master periodically checks the connectivity and health of each of the nodes in the cluster; on the other side, each node in the cluster checks the health of the elected master. These checks are known as “follower checks” and “leader checks”. Elasticsearch allows these checks to occasionally fail or timeout. It considers a node to be faulty only after several consecutive checks have failed. Here are the settings to configure to fault detection (fd):

Setting	Description
`discovery.zen.fd.ping_interval`	How often a node gets pinged. Defaults to `1s`.
`discovery.zen.fd.ping_timeout`	How long to wait for a ping response, defaults to `30s`.
`discovery.zen.fd.ping_retries`	How many ping failures/timeouts cause a node to be considered failed. Defaults to `3`.

For more detail, check the Elasticsearch official document “Cluster fault detection” (7.9).

Logs

Here are some logs related to Zen Discovery that may help you to identify the problem of your cluster.

Not enough master nodes discovery during pinging

WARN: not enough master nodes discovered during pinging (found […], but need [2]), pinging again

There aren’t enough master nodes discovered during pinging. Only N master-eligible nodes were found but the minimum required nodes are 2. The nodes found are described in the list above [...]. It happens probably because one or more master eligible-nodes were disconnected from the network or shutdown. Therefore, the discovery.zen.minimum_master_nodes setting is not satisfied. A master election cannot happen because there are not enough master to elect from. This is a critical error and prevents the cluster to be fully operational. According to documentation “Zen Discovery > no master block (6.8)”, when it happens, by default, the write operations will be rejected. Read operations will succeed, based on the last know cluster configuration. If the setting of discovery.zen.no_master_block is not the default option (write) but option all, then both read and write operations will be rejected.

Source code in Elasticsearch 6.8.

Master left

WARN: master left (reason = failed to ping, tried [3] times, each with maximum [30s] timeout), current nodes: …

This is part of the fault detection as “leader checks”. A node failed to ping master node after 3 times, each with a maximum 30s timeout, so it considers master node is left and decides the join another master. Before doing so, the list of current nodes is logged to record the current situation. The list of nodes is retrieved from the cluster state. This can happen when heavy network issues happen or the master node is disconnected/stopped.

Source code in Elasticsearch 6.8.

Other Logs

I only documented some of the logs here, if you need more, you can find them from ZenDiscovery.java or the Zen module (org.elasticsearch.discovery.zen) in general. If you have a service collecting logs for you, you find search logs related to logger ZenDiscovery to find them out.

Discovery in Elasticsearch 7

The implementation of discovery is rewritten in Elasticsearch 7 as Zen2. Philipp Krenn has an excellent video about this topic. You can see his talk “Reaching Zen in ElasticSearch’s Coordination” (2019) on YouTube:

This talk shows the main improvements of the new implementation: Master elections are much faster, the infamous minimum_master_nodes setting has been removed, growing and shrinking clusters becomes safer and easier, and leaves less room to misconfigure the system. If you need more information about how discovery works on Elasticsearch 7, you can also reach the official documentation Discovery of Elasticsearch.

Plugins

Other plugins exist for discovery:

Plugin	Description
Azure Classic Discovery Plugin (6.8)	(Deprecated in 5.0.0) This plugin uses the Azure Classic API for unicast discovery. The development of its replacement Azure ARM Discovery Plugin was discontinued (Issue #19146).
Google Compute Engine Discovery Plugin (6.8)	This plugin uses the GCE API for unicast discovery.
EC2 Discovery Plugin (6.8)	This plugin uses the AWS API for unicast discovery.

Going Further

How to go further from here?

To better understand Zen Discovery, read the official documentation of Zen Discovery on Elasticsearch 6.8. https://www.elastic.co/guide/en/elasticsearch/reference/6.8/modules-discovery-zen.html
To better understand the implementation of Zen Discovery, read the source code of ZenDiscovery.java (6.8) on GitHub and other classes in the same package.
To better understand discovery and cluster formation, see the official documentation of “Discovery and cluster formation” on Elasticsearch 7.9. https://www.elastic.co/guide/en/elasticsearch/reference/7.9/modules-discovery.html
To see the full list of log errors related to this Elasticsearch discovery, see Opster’s article: Elasticsearch Discovery. https://opster.com/elasticsearch-glossary/elasticsearch-discovery/
To learn more about Elasticsearch, I highly recommend the book “Elasticesarch in Action” written by Radu Gheorghe, Matthew Lee Hinman, and Roy Russo. https://www.manning.com/books/elasticsearch-in-action

Conclusion

In this article, we saw different types of discovery: unicast discovery, file-based discovery, plugin-based discovery; we saw the two most important settings for unicast discovery for the list of seed hosts and the minimum number of master-eligible nodes; we took a look about fault detection in Elasticsearch using pings; we saw some logs related to zen discovery; we also the changes about the discovery in Elasticsearch 7. Finally, I shared some resources which allow you to go further on this topic. Interested to know more? You can subscribe to the feed of my blog, follow me on Twitter or GitHub. Hope you enjoy this article, see you the next time!

References

Radu Gheorghe, Matthew Lee Hinman, Roy Russo, “Elasticsearch in Action”, Manning, 2016. [book]
Opster, “Elasticsearch Discovery”, Opster, 2020.
https://opster.com/elasticsearch-glossary/elasticsearch-discovery/
Philipp Krenn, “Reaching Zen in Elasticsearch’s Coordination”, Berlin Buzzwords, 2019.
https://www.youtube.com/watch?v=Ns1Erg4I92U
Elasticsearch, “Install Elasticsearch with Docker”, Elasticsearch, 2020.
https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html
Elasticsearch, “Configuring Elasticsearch”, Elasticsearch, 2020.
https://www.elastic.co/guide/en/elasticsearch/reference/current/settings.html
Elasticsearch, “Zen Discovery”, Elasticsearch, 2020.
https://www.elastic.co/guide/en/elasticsearch/reference/6.8/modules-discovery-zen.html
Elasticsearch, “Cluster fault detection”, Elasticsearch, 2020.
https://www.elastic.co/guide/en/elasticsearch/reference/7.9/cluster-fault-detection.html
Clinton Gormley, Answer of Issue “Elasticsearch 5.0 (5.0.0-alpha2) Disable multicast - unknown setting error”, GitHub, 2016.
https://github.com/elastic/elasticsearch/issues/18686

PREVIOUSWrap Elasticsearch Response Into CompletableFuture

NEXTGC in Elasticsearch