CRUD (Create Read, Update and Delete) Operations with Elasticsearch

[box type="note" align="" class="" width=""]This article is an excerpt from a book written by Pranav Shukla and Sharath Kumar M N titled Learning Elastic Stack 6.0. This book is for beginners who want to start performing distributed search analytics and visualization using core functionalities of Elasticsearch, Kibana and Logstash.[/box]

In this tutorial, we will look at how to perform basic CRUD operations using Elasticsearch. Elasticsearch has a very well designed REST API, and the CRUD operations are targeted at documents.

To understand how to perform CRUD operations, we will cover the following APIs. These APIs fall under the category of Document APIs that deal with documents:

Index API
Get API
Update API
Delete API

Index API

In Elasticsearch terminology, adding (or creating) a document into a type within an index of Elasticsearch is called an indexing operation. Essentially, it involves adding the document to the index by parsing all fields within the document and building the inverted index. This is why this operation is known as an indexing operation. There are two ways we can index a document:

Indexing a document by providing an ID
Indexing a document without providing an ID

Indexing a document by providing an ID

We have already seen this version of the indexing operation. The user can provide the ID of the document using the PUT method. The format of this request is PUT /<index>/<type>/<id>, with the JSON document as the body of the request:

PUT /catalog/product/1

{

"sku": "SP000001",

"title": "Elasticsearch for Hadoop",

"description": "Elasticsearch for Hadoop",

"author": "Vishal Shukla",

"ISBN": "1785288997",

"price": 26.99

}

Indexing a document without providing an ID

If you don't want to control the ID generation for the documents, you can use the POST method. The format of this request is POST /<index>/<type>, with the JSON document as the body of the request:

POST /catalog/product

{

"sku": "SP000003",

"title": "Mastering Elasticsearch",

"description": "Mastering Elasticsearch",

"author": "Bharvi Dixit",

"price": 54.99

}

The ID in this case will be generated by Elasticsearch. It is a hash string, as highlighted in the response:

{

"_index": "catalog",

"_type": "product",

"_id": "AVrASKqgaBGmnAMj1SBe",

"_version": 1,

"result": "created",

"_shards": {

"total": 2,

"successful": 1,

"failed": 0

},

"created": true

}

As per pure REST conventions, POST is used for creating a new resource and PUT is used for updating an existing resource. Here, the usage of PUT is equivalent to saying I know the ID that I want to assign, so use this ID while indexing this document.

Get API

The Get API is useful for retrieving a document when you already know the ID of the document. It is essentially a get by primary key operation:

GET /catalog/product/AVrASKqgaBGmnAMj1SBe

The format of this request is GET /<index>/<type>/<id>. The response would be as Expected:

{

"_index": "catalog",

"_type": "product",

"_id": "AVrASKqgaBGmnAMj1SBe",

"_version": 1,

"found": true,

"_source": {

"sku": "SP000003",

"title": "Mastering Elasticsearch",

"description": "Mastering Elasticsearch",

"author": "Bharvi Dixit",

"price": 54.99

}

}

Update API

The Update API is useful for updating the existing document by ID. The format of an update request is POST <index>/<type>/<id>/_update with a JSON request as the body:

POST /catalog/product/1/_update

{

"doc": {

"price": "28.99"

}

}

The properties specified under the "doc" element are merged into the existing document. The previous version of this document with ID 1 had price of 26.99. This update operation just updates the price and leaves the other fields of the document unchanged. This type of update means "doc" is specified and used as a partial document to merge with an existing document; there are other types of updates supported. The response of the update request is as follows:

{

"_index": "catalog",

"_type": "product",

"_id": "1",

"_version": 2,

"result": "updated",

"_shards": {

"total": 2,

"successful": 1,

"failed": 0

}

}

Internally, Elasticsearch maintains the version of each document. Whenever a document is updated, the version number is incremented.

The partial update that we have seen above will work only if the document existed beforehand. If the document with the given id did not exist, Elasticsearch will return an error saying that document is missing. Let us understand how do we do an upsert operation using the Update API. The term upsert loosely means update or insert, i.e. update the document if it exists otherwise insert new document.

The parameter doc_as_upsert checks if the document with the given id already exists and merges the provided doc with the existing document. If the document with the given id doesn't exist, it inserts a new document with the given document contents.

The following example uses doc_as_upsert to merge into the document with id 3 or insert a new document if it doesn't exist.

POST /catalog/product/3/_update

{

"doc": {

"author": "Albert Paro",

"title": "Elasticsearch 5.0 Cookbook",

"description": "Elasticsearch 5.0 Cookbook Third Edition",

"price": "54.99"

},

"doc_as_upsert": true

}

We can update the value of a field based on the existing value of that field or another field in the document. The following update uses an inline script to increase the price by two for a specific product:

POST /catalog/product/AVrASKqgaBGmnAMj1SBe/_update

{

"script": {

"inline": "ctx._source.price += params.increment",

"lang": "painless",

"params": {

"increment": 2

}

}

}

Scripting support allows for the reading of the existing value, incrementing the value by a variable, and storing it back in a single operation. The inline script used here is Elasticsearch's own painless scripting language. The syntax for incrementing an existing variable is similar to most other programming languages.

Delete API

The Delete API lets you delete a document by ID: DELETE /catalog/product/AVrASKqgaBGmnAMj1SBe The response of the delete operations is as follows:

{

"found": true,

"_index": "catalog",

"_type": "product",

"_id": "AVrASKqgaBGmnAMj1SBe",

"_version": 4,

"result": "deleted",

"_shards": {

"total": 2,

"successful": 1,

"failed": 0

}

}

This is how basic CRUD operations are performed with Elasticsearch using simple document APIs from any data source in any format securely and reliably.

If you found this tutorial useful, do check out the book Learning Elastic Stack 6.0 and start building end-to-end real-time data processing solutions for your enterprise analytics applications.