10 min read

In this article by Alberto Maria Angelo Paro, the author of the book ElasticSearch 5.0 Cookbook – Third Edition, you will learn the following recipes:

  • Creating an index
  • Deleting an index
  • Opening/closing an index
  • Putting a mapping in an index
  • Getting a mapping

(For more resources related to this topic, see here.)

Creating an index

The first operation to do before starting indexing data in Elasticsearch is to create an index–the main container of our data.

An index is similar to the concept of database in SQL, a container for types (tables in SQL) and documents (records in SQL).

Getting ready

To execute curl via the command line you need to install curl for your operative system.

How to do it…

The HTTP method to create an index is PUT (but also POST works); the REST URL contains the index name:

http://<server>/<index_name>

For creating an index, we will perform the following steps:

  1. From the command line, we can execute a PUT call:
    curl -XPUT http://127.0.0.1:9200/myindex -d '{
    
       "settings" : {
    
           "index" : {
    
               "number_of_shards" : 2,
    
               "number_of_replicas" : 1
    
           }
    
       }
    
        }'
  2. The result returned by Elasticsearch should be:
    {"acknowledged":true,"shards_acknowledged":true}
  3. If the index already exists, a 400 error is returned:
    {
    
    "error" : {
    
       "root_cause" : [
    
         {
    
           "type" : "index_already_exists_exception",
    
           "reason" : "index [myindex/YJRxuqvkQWOe3VuTaTbu7g] already exists",
    
           "index_uuid" : "YJRxuqvkQWOe3VuTaTbu7g",
    
           "index" : "myindex"
    
         }
    
       ],
    
       "type" : "index_already_exists_exception",
    
       "reason" : "index [myindex/YJRxuqvkQWOe3VuTaTbu7g] already exists",
    
       "index_uuid" : "YJRxuqvkQWOe3VuTaTbu7g",
    
       "index" : "myindex"
    
    },
    
    "status" : 400
    }

How it works…

Because the index name will be mapped to a directory on your storage, there are some limitations to the index name, and the only accepted characters are:

  • ASCII letters [a-z]
  • Numbers [0-9]
  • point “.“, minus ““, “&” and “_

During index creation, the replication can be set with two parameters in the settings/index object:

  • number_of_shards, which controls the number of shards that compose the index (every shard can store up to 2^32 documents)
  • number_of_replicas, which controls the number of replica (how many times your data is replicated in the cluster for high availability)A good practice is to set this value at least to 1.

The API call initializes a new index, which means:

  • The index is created in a primary node first and then its status is propagated to all nodes of the cluster level
  • A default mapping (empty) is created
  • All the shards required by the index are initialized and ready to accept data

The index creation API allows defining the mapping during creation time. The parameter required to define a mapping is mapping and accepts multi mappings. So in a single call it is possible to create an index and put the required mappings.

There’s more…

The create index command allows passing also the mappings section, which contains the mapping definitions. It is a shortcut to create an index with mappings, without executing an extra PUT mapping call:

curl -XPOST localhost:9200/myindex -d '{

   "settings" : {

       "number_of_shards" : 2,

       "number_of_replicas" : 1

   },

   "mappings" : {

     "order" : {

         "properties" : {

             "id" : {"type" : "keyword", "store" : "yes"},

             "date" : {"type" : "date", "store" : "no" , "index":"not_analyzed"},

             "customer_id" : {"type" : "keyword", "store" : "yes"},

             "sent" : {"type" : "boolea+n", "index":"not_analyzed"},

             "name" : {"type" : "text", "index":"analyzed"},

             "quantity" : {"type" : "integer", "index":"not_analyzed"},

             "vat" : {"type" : "double", "index":"no"}

         }

     }

}

}'

Deleting an index

The counterpart of creating an index is deleting one.

Deleting an index means deleting its shards, mappings, and data. There are many common scenarios when we need to delete an index, such as:

  • Removing the index to clean unwanted/obsolete data (for example, old Logstash indices).
  • Resetting an index for a scratch restart.
  • Deleting an index that has some missing shard, mainly due to some failures, to bring back the cluster in a valid state (if a node dies and it’s storing a single replica shard of an index, this index is missing a shard so the cluster state becomes red. In this case, you’ll bring back the cluster to a green status, but you lose the data contained in the deleted index).

Getting ready

To execute curl via command line you need to install curl for your operative system.

The index created is required to be deleted.

How to do it…

The HTTP method used to delete an index is DELETE.

The following URL contains only the index name:

http://<server>/<index_name>

For deleting an index, we will perform the steps given as follows:

  1. Execute a DELETE call, by writing the following command:
    curl -XDELETE http://127.0.0.1:9200/myindex
  2. We check the result returned by Elasticsearch. If everything is all right, it should be:
    {"acknowledged":true}
  3. If the index doesn’t exist, a 404 error is returned:
    {
    
    "error" : {
    
       "root_cause" : [
    
         {
    
           "type" : "index_not_found_exception",
    
           "reason" : "no such index",
    
           "resource.type" : "index_or_alias",
    
           "resource.id" : "myindex",
    
           "index_uuid" : "_na_",
    
           "index" : "myindex"
    
         }
    
       ],
    
       "type" : "index_not_found_exception",
    
       "reason" : "no such index",
    
       "resource.type" : "index_or_alias",
    
       "resource.id" : "myindex",
    
       "index_uuid" : "_na_",
    
       "index" : "myindex"
    
    },
    
    "status" : 404
    }

How it works…

When an index is deleted, all the data related to the index is removed from disk and is lost.

During the delete processing, first the cluster is updated, and then the shards are deleted from the storage. This operation is very fast; in a traditional filesystem it is implemented as a recursive delete.

It’s not possible restore a deleted index, if there is no backup.

Also calling using the special _all index_name can be used to remove all the indices. In production it is good practice to disable the all indices deletion by adding the following line to Elasticsearch.yml:

action.destructive_requires_name:true

Opening/closing an index

If you want to keep your data, but save resources (memory/CPU), a good alternative to delete indexes is to close them.

Elasticsearch allows you to open/close an index to put it into online/offline mode.

Getting ready

To execute curl via the command line you need to install curl for your operative system.

How to do it…

For opening/closing an index, we will perform the following steps:

  1. From the command line, we can execute a POST call to close an index using:
    curl -XPOST http://127.0.0.1:9200/myindex/_close
  2. If the call is successful, the result returned by Elasticsearch should be:
    {,"acknowledged":true}
  3. To open an index, from the command line, type the following command:
    curl -XPOST http://127.0.0.1:9200/myindex/_open
  4. If the call is successful, the result returned by Elasticsearch should be:
    {"acknowledged":true}

How it works…

When an index is closed, there is no overhead on the cluster (except for metadata state): the index shards are switched off and they don’t use file descriptors, memory, and threads.

There are many use cases when closing an index:

  • Disabling date-based indices (indices that store their records by date), for example, when you keep an index for a week, month, or day and you want to keep online a fixed number of old indices (that is, two months) and some offline (that is, from two months to six months).
  • When you do searches on all the active indices of a cluster and don’t want search in some indices (in this case, using alias is the best solution, but you can achieve the same concept of alias with closed indices).

An alias cannot have the same name as an index

When an index is closed, calling the open restores its state.

Putting a mapping in an index

We saw how to build mapping by indexing documents. This recipe shows how to put a type mapping in an index. This kind of operation can be considered as the Elasticsearch version of an SQL created table.

Getting ready

To execute curl via the command line you need to install curl for your operative system.

How to do it…

The HTTP method to put a mapping is PUT (also POST works).

The URL format for putting a mapping is:

http://<server>/<index_name>/<type_name>/_mapping

For putting a mapping in an index, we will perform the steps given as follows:

  1. If we consider the type order, the call will be:
    curl -XPUT 'http://localhost:9200/myindex/order/_mapping' -d '{
    
       "order" : {
    
           "properties" : {
    
               "id" : {"type" : "keyword", "store" : "yes"},
    
               "date" : {"type" : "date", "store" : "no" , "index":"not_analyzed"},
    
               "customer_id" : {"type" : "keyword", "store" : "yes"},
    
               "sent" : {"type" : "boolean", "index":"not_analyzed"},
    
               "name" : {"type" : "text", "index":"analyzed"},
    
               "quantity" : {"type" : "integer", "index":"not_analyzed"},
    
               "vat" : {"type" : "double", "index":"no"}
    
           }
    
       }
    
        }'
  2. In case of success, the result returned by Elasticsearch should be:
    {"acknowledged":true}

How it works…

This call checks if the index exists and then it creates one or more type mapping as described in the definition.

During mapping insert if there is an existing mapping for this type, it is merged with the new one. If there is a field with a different type and the type could not be updated, an exception expanding fields property is raised. To prevent an exception during the merging mapping phase, it’s possible to specify the ignore_conflicts parameter to true (default is false).

The put mapping call allows you to set the type for several indices in one shot; list the indices separated by commas or to apply all indexes using the _all alias.

There’s more…

There is not a delete operation for mapping. It’s not possible to delete a single mapping from an index. To remove or change a mapping you need to manage the following steps:

  1. Create a new index with the new/modified mapping
  2. Reindex all the records
  3. Delete the old index with incorrect mapping

Getting a mapping

After having set our mappings for processing types, we sometimes need to control or analyze the mapping to prevent issues. The action to get the mapping for a type helps us to understand structure or its evolution due to some merge and implicit type guessing.

Getting ready

To execute curl via command-line you need to install curl for your operative system.

How to do it…

The HTTP method to get a mapping is GET.

The URL formats for getting mappings are:

http://<server>/_mapping

http://<server>/<index_name>/_mapping

http://<server>/<index_name>/<type_name>/_mapping

To get a mapping from the type of an index, we will perform the following steps:

  1. If we consider the type order of the previous chapter, the call will be:
    curl -XGET 'http://localhost:9200/myindex/order/_mapping?pretty=true'

    The pretty argument in the URL is optional, but very handy to pretty print the response output.

  2. The result returned by Elasticsearch should be:
    {
    
    "myindex" : {
    
       "mappings" : {
    
         "order" : {
    
           "properties" : {
    
             "customer_id" : {
    
               "type" : "keyword",
    
           "store" : true
    
             },
    
    … truncated
    
           }
    
         }
    
       }
    
    }
    }

How it works…

The mapping is stored at the cluster level in Elasticsearch. The call checks both index and type existence and then it returns the stored mapping.

The returned mapping is in a reduced form, which means that the default values for a field are not returned.

Elasticsearch stores only not default field values to reduce network and memory consumption.

Retrieving a mapping is very useful for several purposes:

  • Debugging template level mapping
  • Checking if implicit mapping was derivated correctly by guessing fields
  • Retrieving the mapping metadata, which can be used to store type-related information
  • Simply checking if the mapping is correct

If you need to fetch several mappings, it is better to do it at index level or cluster level to reduce the numbers of API calls.

Summary

We learned how to manage indices and perform operations on documents. We’ll discuss different operations on indices such as create, delete, update, open, and close. These operations are very important because they allow better define the container (index) that will store your documents. The index create/delete actions are similar to the SQL create/delete database commands.

Resources for Article:


Further resources on this subject:


LEAVE A REPLY

Please enter your comment!
Please enter your name here