How to perform Numeric Metric Aggregations with Elasticsearch

[box type="note" align="" class="" width=""]This article is an excerpt from the book Learning Elastic Stack 6.0 written by Pranav Shukla and Sharath Kumar M N . This book provides detailed coverage on fundamentals of each components of Elastic Stack, making it easy to search, analyze and visualize data across different sources in real-time.[/box]

Today, we are going to demonstrate how to run numeric and statistical queries such as summation, average, count and various similar metric aggregations on Elastic Stack to serve a better analytics engine on your dataset.

Metric aggregations

Metric aggregations work with numeric data, computing one or more aggregate metrics within the given context. The context could be a query, filter, or no query to include the whole index/type. Metric aggregations can also be nested inside other bucket aggregations. In this case, these metrics will be computed for each bucket in the bucket aggregations.

We will start with simple metric aggregations without nesting them inside bucket aggregations. When we learn about bucket aggregations later in the chapter, we will also learn how to use metric aggregations inside bucket aggregations.

We will learn about the following metric aggregations:

Sum, average, min, and max aggregations
Stats and extended stats aggregations
Cardinality aggregation

Let us learn about them one by one.

Sum, average, min, and max aggregations

Finding the sum of a field, the minimum value for a field, the maximum value for a field, or an average, are very common operations. For the people who are familiar with SQL, the query to find the sum would look like the following:

SELECT sum(downloadTotal) FROM usageReport;

The preceding query will calculate the sum of the downloadTotal field across all records in the table. This requires going through all records of the table or all records in the given context and adding the values of the given fields.

In Elasticsearch, a similar query can be written using the sum aggregation. Let us understand the sum aggregation first.

Sum aggregation

Here is how to write a simple sum aggregation:

GET bigginsight/_search
{
"aggregations": { 1
"download_sum": { 2
"sum": { 3
"field": "downloadTotal" 4
}
}
},
"size": 0 5
}

The aggs or aggregations element at the top level should wrap any aggregation.
Give a name to the aggregation; here we are doing the sum aggregation on the downloadTotal field and hence the name we chose is download_sum. You can name it anything. This field will be useful while looking up this particular aggregation's result in the response.
We are doing a sum aggregation, hence the sum element.
We want to do term aggregation on the downloadTotal field.

Specify size = 0 to prevent raw search results from being returned. We just want aggregation results and not the search results in this case. Since we haven't specified any top level query elements, it matches all documents. We do not want any raw documents (or search hits) in the result.

The response should look like the following:

{
"took": 92,
...
"hits": {
"total": 242836, 1
"max_score": 0,
"hits": []
},
"aggregations": { 2
"download_sum": { 3
"value": 2197438700 4
}
}
}

Let us understand the key aspects of the response. The key parts are numbered 1, 2, 3, and so on, and are explained in the following points:

The hits.total element shows the number of documents that were considered or were in the context of the query. If there was no additional query or filter specified, it will include all documents in the type or index.
Just like the request, this response is wrapped inside aggregations to indicate as Such.
The response of the aggregation requested by us was named download_sum, hence we get our response from the sum aggregation inside an element with the same name.
The actual value after applying the sum aggregation.

The average, min, and max aggregations are very similar. Let's look at them briefly.

Average aggregation

The average aggregation finds an average across all documents in the querying context:

GET bigginsight/_search
{
"aggregations": {
"download_average": { 1
"avg": { 2
"field": "downloadTotal"
}
}
},
"size": 0
}

The only notable differences from the sum aggregation are as follows:

We chose a different name, download_average, to make it apparent that the aggregation is trying to compute the average.
The type of aggregation that we are doing is avg instead of the sum aggregation that we were doing earlier.

The response structure is identical but the value field will now represent the average of the requested field. The min and max aggregations are the exactly same.

Min aggregation

Here is how we will find the minimum value of the downloadTotal field in the entire index/type:

GET bigginsight/_search
{
"aggregations": {
"download_min": {
"min": {
"field": "downloadTotal"
}
}
},
"size": 0
}

Let's finally look at max aggregation also.

Max aggregation

Here is how we will find the maximum value of the downloadTotal field in the entire index/type:

GET bigginsight/_search
{
"aggregations": {
"download_max": {
"max": {
"field": "downloadTotal"
}
}
},
"size": 0
}

These aggregations were really simple. Now let's look at some more advanced yet simple stats and extended stats aggregations.

Stats and extended stats aggregations

These aggregations compute some common statistics in a single request without having to issue multiple requests. This saves resources on the Elasticsearch side as well because the statistics are computed in a single pass rather than being requested multiple times. The client code also becomes simpler if you are interested in more than one of these statistics.

Let's look at the stats aggregation first.

Stats aggregation

The stats aggregation computes the sum, average, min, max, and count of documents in a single pass:

GET bigginsight/_search
{
"aggregations": {
"download_stats": {
"stats": {
"field": "downloadTotal"
}
}
},
"size": 0
}

The structure of the stats request is the same as the other metric aggregations we have seen
so far, so nothing special is going on here.

The response should look like the following:

{
"took": 4,
...,
"hits": {
"total": 242836,
"max_score": 0,
"hits": []
},
"aggregations": {
"download_stats": {
"count": 242835,
"min": 0,
"max": 241213,
"avg": 9049.102065188297,
"sum": 2197438700
}
}
}

As you can see, the response with the download_stats element contains count, min, max, average, and sum; everything is included in the same response. This is very handy as it reduces the overhead of multiple requests and also simplifies the client code.

Let us look at the extended stats aggregation.

Extended stats Aggregation

The extended stats aggregation returns a few more statistics in addition to the ones returned by the stats aggregation:

GET bigginsight/_search
{
"aggregations": {
"download_estats": {
"extended_stats": {
"field": "downloadTotal"
}
}
},
"size": 0
}

The response looks like the following:

{
"took": 15,
"timed_out": false,
...,
"hits": {
"total": 242836,
"max_score": 0,
"hits": []
},
"aggregations": {
"download_estats": {
"count": 242835,
"min": 0,
"max": 241213,
"avg": 9049.102065188297,
"sum": 2197438700,
"sum_of_squares": 133545882701698,
"variance": 468058704.9782911,
"std_deviation": 21634.664429528162,
"std_deviation_bounds": {
"upper": 52318.43092424462,
"lower": -34220.22679386803
}
}
}
}

It also returns the sum of squares, variance, standard deviation, and standard deviation Bounds.

Cardinality aggregation

Finding the count of unique elements can be done with the cardinality aggregation. It is similar to finding the result of a query such as the following:

select count(*) from (select distinct username from usageReport) u;

Finding the cardinality or the number of unique values for a specific field is a very common
requirement. If you have click-stream from the different visitors on your website, you may want to find out how many unique visitors you got in a given day, week, or month.

Let us understand how we find out the count of unique users for which we have network traffic data:

GET bigginsight/_search
{
"aggregations": {
"unique_visitors": {
"cardinality": {
"field": "username"
}
}
},
"size": 0
}

The cardinality aggregation response is just like the other metric aggregations:

{
"took": 110,
...,
"hits": {
"total": 242836,
"max_score": 0,
"hits": []
},
"aggregations": {
"unique_visitors": {
"value": 79
}
}
}

To summarize, we learned how to perform numerous metric aggregations on numeric datasets and easily deploy elasticsearch in building powerful analytics application.

If you found this tutorial useful, do check out the book Learning Elastic Stack 6.0 to examine the fundamentals of Elastic Stack in detail and start developing solutions for problems like logging, site search, app search, metrics and more.