How to execute a search query in ElasticSearch

[box type="note" align="" class="" width=""]This post is an excerpt from a book authored by Alberto Paro, titled Elasticsearch 5.x Cookbook. It has over 170 advance recipes to search, analyze, deploy, manage, and monitor data effectively with Elasticsearch 5.x[/box]

In this article we see how to execute and view a search operation in ElasticSearch. Elasticsearch was born as a search engine. It’s main purpose is to process queries and give results. In this article, we'll see that a search in Elasticsearch is not only limited to matching documents, but it can also calculate additional information required to improve the search quality.

All the codes in this article are available on PacktPub or GitHub. These are the scripts to initialize all the required data.

Getting ready

You will need an up-and-running Elasticsearch installation. To execute curl via a command line, you will also need to install curl for your operating system.

To correctly execute the following commands you will need an index populated with the chapter_05/populate_query.sh script available in the online code.

The mapping used in all the article queries and searches is the following:

{

"mappings": {

"test-type": { "properties": {

"pos": {

"type": "integer",

"store": "yes"

},

"uuid": { "store": "yes",

"type": "keyword"

},

"parsedtext": {

"term_vector": "with_positions_offsets",

"store": "yes",

"type": "text"

},

"name": {

"term_vector": "with_positions_offsets",

"store": "yes",

"fielddata": true, "type": "text", "fields": {

"raw": {

"type": "keyword"

}

}

},

"title": {

"term_vector": "with_positions_offsets",

"store": "yes",

"type": "text", "fielddata": true, "fields": {

"raw": {

"type": "keyword"

}

}

}

}

},

"test-type2": {

"_parent": {

"type": "test-type"

}

}

}

}

How to do it

To execute the search and view the results, we will perform the following steps:

From the command line, we can execute a search as follows:

curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search'

-d '{"query":{"match_all":{}}}'

In this case, we have used a match_all query that means return all the documents.

If everything works, the command will return the following:

{

"took" : 2, "timed_out" : false, "_shards" : {

"total" : 5,

"successful" : 5,

"failed" : 0

},

"hits" : {

"total" : 3,

"max_score" : 1.0,

"hits" : [ {

"_index" : "test-index",

"_type" : "test-type",

"_id" : "1",

"_score" : 1.0, "_source" : {"position": 1, "parsedtext":

"Joe Testere nice guy", "name": "Joe Tester", "uuid": "11111"}

}, {

"_index" : "test-index",

"_type" : "test-type",

"_id" : "2",

"_score" : 1.0, "_source" : {"position": 2, "parsedtext": "Bill Testere nice guy", "name": "Bill Baloney", "uuid": "22222"}

}, {

"_index" : "test-index",

"_type" : "test-type",

"_id" : "3", "_score" : 1.0,

"_source" : {"position": 3, "parsedtext":

"Bill is notn

nice guy", "name": "Bill Clinton", "uuid": "33333"}

} ]

}

}

These results contain a lot of information:

took is the milliseconds of time required to execute the query.
time_out indicates whether a timeout occurred during the search. This is related to the timeout parameter of the search. If a timeout occurs, you will get partial or no results.
_shards is the status of shards divided into:
- total, which is the number of shards.
- successful, which is the number of shards in which the query was successful.
- failed, which is the number of shards in which the query failed, because some error or exception occurred during the query.
hits are the results which are composed of the following:
- total is the number of documents that match the query.
- max_score is the match score of first document. It is usually one, if no match scoring was computed, for example in sorting or filtering.
- Hits which is a list of result documents.

The resulting document has a lot of fields that are always available and others that depend on search parameters. The most important fields are as follows:

_index: The index field contains the document
_type: The type of the document
_id: This is the ID of the document
_source(this is the default field returned, but it can be disabled): the document source
_score: This is the query score of the document
sort: If the document is sorted, values that are used for sorting
highlight: Highlighted segments if highlighting was requested
fields: Some fields can be retrieved without needing to fetch all the source objects

How it works

The HTTP method to execute a search is GET (although POST also works); the REST endpoints are as follows:

http://<server>/_search

http://<server>/<index_name(s)>/_search

http://<server>/<index_name(s)>/<type_name(s)>/_search

Note: Not all the HTTP clients allow you to send data via a GET call, so the best practice, if you need to send body data, is to use the POST call.

Multi indices and types are comma separated. If an index or a type is defined, the search is limited only to them. One or more aliases can be used as index names.

The core query is usually contained in the body of the GET/POST call, but a lot of options can also be expressed as URI query parameters, such as the following:

q: This is the query string to do simple string queries, as follows:

curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search? q=uuid:11111'

df: This is the default field to be used within the query, as follows:

curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search? df=uuid&q=11111'

from(the default value is 0): The start index of the hits.
size(the default value is 10): The number of hits to be returned.
analyzer: The default analyzer to be used.
default_operator(the default value is OR): This can be set to AND or OR. explain: This allows the user to return information about how the score is calculated, as follows:

curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search? q=parsedtext:joe&explain=true'

stored_fields: These allows the user to define fields that must be returned, as follows:

curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search? q=parsedtext:joe&stored_fields=name'

sort(the default value is score): This allows the user to change the documents in order. Sort is ascendant by default; if you need to change the order, add desc to the field, as follows:

curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search? sort=name.raw:desc'

timeout(not active by default): This defines the timeout for the search. Elasticsearch tries to collect results until a timeout. If a timeout is fired, all the hits accumulated are returned.
search_type: This defines the search strategy. A reference is available in the online Elasticsearch documentation at https://www.elastic.co/guide/en/elas ticsearch/reference/current/search-request-search-type.html.
track_scores(the default value is false): If true, this tracks the score and

allows it to be returned with the hits. It's used in conjunction with sort, because sorting by default prevents the return of a match score.

pretty (the default value is false): If true, the results will be pretty printed.

Generally, the query, contained in the body of the search, is a JSON object. The body of the search is the core of Elasticsearch's search functionalities; the list of search capabilities extends in every release. For the current version (5.x) of Elasticsearch, the available parameters are as follows:

query: This contains the query to be executed. Later in this chapter, we will see how to create different kinds of queries to cover several scenarios.
from: This allows the user to control pagination. The from parameter defines the start position of the hits to be returned (default 0) and size (default 10).

Note: The pagination is applied to the currently returned search results. Firing the same query can bring different results if a lot of records have the same score or a new document is ingested. If you need to process all the result documents without repetition, you need to execute scan or scroll queries.

sort: This allows the user to change the order of the matched documents.
post_filter: This allows the user to filter out the query results without affecting the aggregation count. It's usually used for filtering by facet values.
_source: This allows the user to control the returned source. It can be disabled (false), partially returned (obj.*) or use multiple exclude/include rules. This functionality can be used instead of fields to return values (for complete coverage of this, take a look at the online Elasticsearch reference at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/ search-request-source-filtering.html).
fielddata_fields: This allows the user to return a field data representation of the field.
stored_fields: This controls the fields to be returned.

Note: Returning only the required fields reduces the network and memory usage, improving the performance. The suggested way to retrieve custom fields is to use the _source filtering function because it doesn't need to use Elasticsearch's extra resources.

aggregations/aggs: These control the aggregation layer analytics. These will be discussed in the next chapter.
index_boost: This allows the user to define the per-index boost value. It is used to increase/decrease the score of results in boosted indices.
highlighting: This allows the user to define fields and settings to be used for calculating a query abstract.
version(the default value false): This adds the version of a document in the results.
rescore: This allows the user to define an extra query to be used in the score to improve the quality of the results. The rescore query is executed on the hits that match the first query and filter.
min_score: If this is given, all the result documents that have a score lower than this value are rejected.
explain: This returns information on how the TD/IF score is calculated for a particular document.
script_fields: This defines a script that computes extra fields via scripting to be returned with a hit.
suggest: If given a query and a field, this returns the most significant terms related to this query. This parameter allows the user to implement the Google- like do you mean functionality.
search_type: This defines how Elasticsearch should process a query.
scroll: This controls the scrolling in scroll/scan queries. The scroll allows the user to have an Elasticsearch equivalent of a DBMS cursor.
_name: This allows returns for every hit that matches the named queries. It's very useful if you have a Boolean and you want the name of the matched query.
search_after: This allows the user to skip results using the most efficient way of scrolling.
preference: This allows the user to select which shard/s to use for executing the query.

We saw how to execute a search in ElasticSearch and also learnt about how it works. To know more on how to perform other operations in ElasticSearch check out the book Elasticsearch 5.x Cookbook.