(For more resources related to this topic, see here.)
ElasticSearch has an active community and the release cycles are very fast.
Because ElasticSearch depends on many common Java libraries (Lucene, Guice, and Jackson are the most famous ones), the ElasticSearch community tries to keep them updated and fix bugs that are discovered in them and in ElasticSearch core.
If it’s possible, the best practice is to use the latest available release (usually the more stable one).
A supported ElasticSearch Operative System (Linux/MacOSX/Windows) with installed Java JVM 1.6 or above is required. A web browser is required to download the ElasticSearch binary release.
For downloading and installing an ElasticSearch server, we will perform the steps given as follows:
Download ElasticSearch from the Web.
The latest version is always downloadable from the web address http://www.elasticsearch.org/download/.
There are versions available for different operative systems:
These packages contain everything to start ElasticSearch.
At the time of writing this book, the latest and most stable version of ElasticSearch was 0.90.7. To check out whether this is the latest available or not, please visit http://www.elasticsearch.org/download/.
Extract the binary content.
After downloading the correct release for your platform, the installation consists of expanding the archive in a working directory.
Choose a working directory that is safe to charset problems and doesn’t have a long path to prevent problems when ElasticSearch creates its directories to store the index data.
For windows platform, a good directory could be c:es, on Unix and MacOSX /opt/ es.
To run ElasticSearch, you need a Java Virtual Machine 1.6 or above installed. For better performance, I suggest you use Sun/Oracle 1.7 version.
We start ElasticSearch to check if everything is working.
To start your ElasticSearch server, just go in the install directory and type:
# bin/elasticsearch –f (for Linux and MacOsX)
or
# binelasticserch.bat –f (for Windows)
Now your server should start as shown in the following screenshot:
The ElasticSearch package contains three directories:
During ElasticSearch startup a lot of events happen:
There are more events which are fired during ElasticSearch startup. We’ll see them in detail in other recipes.
Correctly setting up a networking is very important for your node and cluster.
As there are a lot of different install scenarios and networking issues in this recipe we will cover two kinds of networking setups:
You need a working ElasticSearch installation and to know your current networking configuration (that is, IP).
For configuring networking, we will perform the steps as follows:
Open the ElasticSearch configuration file with your favorite text editor.
Using the standard ElasticSearch configuration file (config/elasticsearch. yml), your node is configured to bind on all your machine interfaces and does autodiscovery broadcasting events, that means it sends “signals” to every machine in the current LAN and waits for a response. If a node responds to it, they can join in a cluster.
If another node is available in the same LAN, they join in the cluster.
Only nodes with the same ElasticSearch version and same cluster name (cluster.name option in elasticsearch.yml) can join each other.
cluster.name: elasticsearch node.name: "My wonderful server" network.host: 192.168.0.1 discovery.zen.ping.unicast.hosts: ["192.168.0.2","192.168.0.3[9300- 9400]"]
This configuration sets the cluster name to elasticsearch, the node name, the network address, and it tries to bind the node to the address given in the discovery section.
We can check the configuration during node loading.
We can now start the server and check if the network is configured:
[INFO ][node ] [Aparo] version[0.90.3], pid[16792], build[5c38d60/2013-08-06T13:18:31Z] [INFO ][node ] [Aparo] initializing ... [INFO ][plugins ] [Aparo] loaded [transport-thrift, rivertwitter, mapper-attachments, lang-python, jdbc-river, langjavascript], sites [bigdesk, head] [INFO ][node ] [Aparo] initialized [INFO ][node ] [Aparo] starting ... [INFO ][transport ] [Aparo] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.1.5:9300]} [INFO ][cluster.service] [Aparo] new_master [Angela Cairn] [yJcbdaPTSgS7ATQszgpSow][inet[/192.168.1.5:9300]], reason: zendisco- join (elected_as_master) [INFO ][discovery ] [Aparo] elasticsearch/ yJcbdaPTSgS7ATQszgpSow [INFO ][http ] [Aparo] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.1.5:9200]} [INFO ][node ] [Aparo] started
In this case, we have:
It works as follows:
ElasticSearch allows you to customize several parameters in an installation. In this recipe, we’ll see the most used ones to define where to store our data and to improve general performances.
You need a working ElasticSearch installation.
The steps required for setting up a simple node are as follows:
path.conf: /opt/data/es/conf path.data: /opt/data/es/data1,/opt2/data/data2 path.work: /opt/data/work path.logs: /opt/data/logs path.plugins: /opt/data/plugins
index.number_of_shards: 5 index.number_of_replicas: 1
The path.conf file defines the directory that contains your configuration: mainly elasticsearch.yml and logging.yml. The default location is $ES_HOME/config with ES_HOME your install directory.
It’s useful to set up the config directory outside your application directory so you don’t need to copy configuration files every time you update the version or change the ElasticSearch installation directory.
The path.data file is the most important one: it allows defining one or more directories where you store index data. When you define more than one directory, they are managed similarly to a RAID 0 configuration (the total space is the sum of all the data directory entry points), favoring locations with the most free space.
The path.work file is a location where ElasticSearch puts temporary files.
The path.log file is where log files are put. The control how to log is managed in logging.yml.
The path.plugins file allows overriding the plugins path (default $ES_HOME/plugins). It’s useful to put “system wide” plugins.
The main parameters used to control the index and shard is index.number_of_shards, that controls the standard number of shards for a new created index, and index.number_ of_replicas that controls the initial number of replicas.
There are a lot of other parameters that can be used to customize your ElasticSearch installation and new ones are added with new releases. The most important ones are described in this recipe and in the next one.
I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…
Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…
Once we learn how to deploy an Ubuntu server, how to manage users, and how…
Key-takeaways: Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…
While developing a web application, or setting dynamic pages and meta tags we need to deal with…
Software architecture is one of the most discussed topics in the software industry today, and…