(For more resources related to this topic, see here.)

Downloading and installing ElasticSearch

ElasticSearch has an active community and the release cycles are very fast.

Because ElasticSearch depends on many common Java libraries (Lucene, Guice, and Jackson are the most famous ones), the ElasticSearch community tries to keep them updated and fix bugs that are discovered in them and in ElasticSearch core.

If it's possible, the best practice is to use the latest available release (usually the more stable one).

Getting ready

A supported ElasticSearch Operative System (Linux/MacOSX/Windows) with installed Java JVM 1.6 or above is required. A web browser is required to download the ElasticSearch binary release.

How to do it...

For downloading and installing an ElasticSearch server, we will perform the steps given as follows:

Download ElasticSearch from the Web.

The latest version is always downloadable from the web address http://www.elasticsearch.org/download/.

There are versions available for different operative systems:
- elasticsearch-{ version-number} .zip: This is for both Linux/Mac OSX, and Windows operating systems
- elasticsearch-{ version-number} .tar.gz: This is for Linux/Mac
- elasticsearch-{ version-number} .deb: This is for Debian-based Linux distributions (this also covers Ubuntu family)
These packages contain everything to start ElasticSearch.

At the time of writing this book, the latest and most stable version of ElasticSearch was 0.90.7. To check out whether this is the latest available or not, please visit http://www.elasticsearch.org/download/.
Extract the binary content.

After downloading the correct release for your platform, the installation consists of expanding the archive in a working directory.

Choose a working directory that is safe to charset problems and doesn't have a long path to prevent problems when ElasticSearch creates its directories to store the index data.

For windows platform, a good directory could be c:es, on Unix and MacOSX /opt/ es.

To run ElasticSearch, you need a Java Virtual Machine 1.6 or above installed. For better performance, I suggest you use Sun/Oracle 1.7 version.
We start ElasticSearch to check if everything is working.

To start your ElasticSearch server, just go in the install directory and type:
```
# bin/elasticsearch –f (for Linux and MacOsX)
```
or
```
# binelasticserch.bat –f (for Windows)
```
Now your server should start as shown in the following screenshot:

How it works...

The ElasticSearch package contains three directories:

bin: This contains script to start and manage ElasticSearch. The most important ones are:
- elasticsearch(.bat): This is the main script to start ElasticSearch
- plugin(.bat): This is a script to manage plugins
config: This contains the ElasticSearch configs. The most important ones are:
- elasticsearch.yml: This is the main config file for ElasticSearch
- logging.yml: This is the logging config file
lib: This contains all the libraries required to run ElasticSearch

There's more...

During ElasticSearch startup a lot of events happen:

A node name is chosen automatically (that is Akenaten in the example) if not provided in elasticsearch.yml.
A node name hash is generated for this node (that is, whqVp_4zQGCgMvJ1CXhcWQ).
If there are plugins (internal or sites), they are loaded. In the previous example there are no plugins.
Automatically if not configured, ElasticSearch binds on all addresses available two ports:
- 9300 internal, intra node communication, used for discovering other nodes
- 9200 HTTP REST API port
After starting, if indices are available, they are checked and put in online mode to be used.

There are more events which are fired during ElasticSearch startup. We'll see them in detail in other recipes.

Networking setupM

Correctly setting up a networking is very important for your node and cluster.

As there are a lot of different install scenarios and networking issues in this recipe we will cover two kinds of networking setups:

Standard installation with autodiscovery working configuration
Forced IP configuration; used if it is not possible to use autodiscovery

Getting ready

You need a working ElasticSearch installation and to know your current networking configuration (that is, IP).

How to do it...

For configuring networking, we will perform the steps as follows:

Open the ElasticSearch configuration file with your favorite text editor.

Using the standard ElasticSearch configuration file (config/elasticsearch. yml), your node is configured to bind on all your machine interfaces and does autodiscovery broadcasting events, that means it sends "signals" to every machine in the current LAN and waits for a response. If a node responds to it, they can join in a cluster.

If another node is available in the same LAN, they join in the cluster.

Only nodes with the same ElasticSearch version and same cluster name (cluster.name option in elasticsearch.yml) can join each other.
To customize the network preferences, you need to change some parameters in the elasticsearch.yml file, such as:
```
cluster.name: elasticsearch
node.name: "My wonderful server"
network.host: 192.168.0.1
discovery.zen.ping.unicast.hosts: ["192.168.0.2","192.168.0.3[9300-
9400]"]
```
This configuration sets the cluster name to elasticsearch, the node name, the network address, and it tries to bind the node to the address given in the discovery section.

We can check the configuration during node loading.

We can now start the server and check if the network is configured:

[INFO ][node ] [Aparo] version[0.90.3], pid[16792],
build[5c38d60/2013-08-06T13:18:31Z]
[INFO ][node ] [Aparo] initializing ...
[INFO ][plugins ] [Aparo] loaded [transport-thrift, rivertwitter,
mapper-attachments, lang-python, jdbc-river, langjavascript],
sites [bigdesk, head]
[INFO ][node ] [Aparo] initialized
[INFO ][node ] [Aparo] starting ...
[INFO ][transport ] [Aparo] bound_address
{inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/192.168.1.5:9300]}
[INFO ][cluster.service] [Aparo] new_master [Angela Cairn]
[yJcbdaPTSgS7ATQszgpSow][inet[/192.168.1.5:9300]], reason: zendisco-
join (elected_as_master)
[INFO ][discovery ] [Aparo] elasticsearch/
yJcbdaPTSgS7ATQszgpSow
[INFO ][http ] [Aparo] bound_address
{inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/192.168.1.5:9200]}
[INFO ][node ] [Aparo] started

In this case, we have:

The transport bounds to 0:0:0:0:0:0:0:0:9300 and 192.168.1.5:9300
The REST HTTP interface bounds to 0:0:0:0:0:0:0:0:9200 and 192.168.1.5:9200

How it works...

It works as follows:

cluster.name: This sets up the name of the cluster (only nodes with the same name can join).
node.name: If this is not defined, it is automatically generated by ElasticSearch. It allows defining a name for the node. If you have a lot of nodes on different machines, it is useful to set this name meaningful to easily locate it. Using a valid name is easier to remember than a generated name, such as whqVp_4zQGCgMvJ1CXhcWQ
network.host: This defines the IP of your machine to be used in binding the node. If your server is on different LANs or you want to limit the bind on only a LAN, you must set this value with your server IP.
discovery.zen.ping.unicast.hosts: This allows you to define a list of hosts (with ports or port range) to be used to discover other nodes to join the cluster. This setting allows using the node in LAN where broadcasting is not allowed or autodiscovery is not working (that is, packet filtering routers). The referred port is the transport one, usually 9300. The addresses of the hosts list can be a mix of:
- host name, that is, myhost1
- IP address, that is, 192.168.1.2
- IP address or host name with the port, that is, myhost1:9300 and 192.168.1.2:9300
- IP address or host name with a range of ports, that is, myhost1:[9300-9400], 192.168.1.2:[9300-9400]

Setting up a node

ElasticSearch allows you to customize several parameters in an installation. In this recipe, we'll see the most used ones to define where to store our data and to improve general performances.

Getting ready

You need a working ElasticSearch installation.

How to do it...

The steps required for setting up a simple node are as follows:

Open the config/elasticsearch.yml file with an editor of your choice.

Set up the directories that store your server data:

path.conf: /opt/data/es/conf path.data: /opt/data/es/data1,/opt2/data/data2 path.work: /opt/data/work path.logs: /opt/data/logs path.plugins: /opt/data/plugins

Set up parameters to control the standard index creation. These parameters are:
```
index.number_of_shards: 5
index.number_of_replicas: 1
```

How it works...

The path.conf file defines the directory that contains your configuration: mainly elasticsearch.yml and logging.yml. The default location is $ES_HOME/config with ES_HOME your install directory.

It's useful to set up the config directory outside your application directory so you don't need to copy configuration files every time you update the version or change the ElasticSearch installation directory.

The path.data file is the most important one: it allows defining one or more directories where you store index data. When you define more than one directory, they are managed similarly to a RAID 0 configuration (the total space is the sum of all the data directory entry points), favoring locations with the most free space.

The path.work file is a location where ElasticSearch puts temporary files.

The path.log file is where log files are put. The control how to log is managed in logging.yml.

The path.plugins file allows overriding the plugins path (default $ES_HOME/plugins). It's useful to put "system wide" plugins.

The main parameters used to control the index and shard is index.number_of_shards, that controls the standard number of shards for a new created index, and index.number_ of_replicas that controls the initial number of replicas.

There's more...

There are a lot of other parameters that can be used to customize your ElasticSearch installation and new ones are added with new releases. The most important ones are described in this recipe and in the next one.