17 min read

A geo point refers to the latitude and longitude of a point on Earth. Each location on it has its own unique latitude and longitude. Elasticsearch is aware of geo-based points and allows you to perform various operations on top of it. In many contexts, it’s also required to consider a geo location component to obtain various functionalities. For example, say you need to search for all the nearby restaurants that serve Chinese food or I need to find the nearest cab that is free. In some other situation, I need to find to which state a particular geo point location belongs to understand where I am currently standing.

This article by Vineeth Mohan, author of the book Elasticsearch Blueprints, is modeled such that all the examples mentioned are related to real-life scenarios, of restaurant searching, for better understanding. Here, we take the example of sorting restaurants based on geographical preferences. A number of cases ranging from the simple, such as finding the nearest restaurant, to the more complex case, such as categorization of restaurants based on distance are covered in this article.

What makes Elasticsearch unique and powerful is the fact that you can combine geo operation with any other normal search query to yield results clubbed with both the location data and the query data.

(For more resources related to this topic, see here.)

Restaurant search

Let’s consider creating a search portal for restaurants. The following are its requirements:

  • To find the nearest restaurant with Chinese cuisine, which has the word ChingYang in its name.
  • To decrease the importance of all restaurants outside city limits.
  • To find the distance between the restaurant and current point for each of the preceding restaurant matches.
  • To find whether the person is in a particular city’s limit or not.
  • To aggregate all restaurants within a distance of 10 km. That is, for a radius of the first 10 km, we have to compute the number of restaurants. For the next 10 km, we need to compute the number of restaurants and so on.

Data modeling for restaurants

Firstly, we need to see the aspects of data and model it around a JSON document for Elasticsearch to make sense of the data. A restaurant has a name, its location information, and rating. To store the location information, Elasticsearch has a provision to understand the latitude and longitude information and has features to conduct searches based on it. Hence, it would be best to use this feature.

Let’s see how we can do this.

First, let’s see what our document should look like:

{
"name" : "Tamarind restaurant",
"location" : {
     "lat" : 1.10,
     "lon" : 1.54
}
}

Now, let’s define the schema for the same:

curl -X PUT "http://$hostname:9200/restaurants" -d '{
   "index": {
       "number_of_shards": 1,
       "number_of_replicas": 1
 },
   "analysis":{    
       "analyzer":{        
           "flat" : {
               "type" : "custom",
               "tokenizer" : "keyword",
               "filter" : "lowercase"
           }
       }
   }
}'
 
echo
curl -X PUT "http://$hostname:9200/restaurants /restaurant/_mapping" -d '{
   "restaurant" : {
   "properties" : {
       "name" : { "type" : "string" },
       "location" : { "type" : "geo_point", "accuracy" : "1km" }
   }}
 
}'

Let’s now index some documents in the index. An example of this would be the Tamarind restaurant data shown in the previous section. We can index the data as follows:

curl -XPOST 'http://localhost:9200/restaurants/restaurant' -d '{
   "name": "Tamarind restaurant",
   "location": {
       "lat": 1.1,
       "lon": 1.54
   }
}'

Likewise, we can index any number of documents. For the sake of convenience, we have indexed only a total of five restaurants for this article.

The latitude and longitude should be of this format. Elasticsearch also accepts two other formats (geohash and lat_lon), but let’s stick to this one. As we have mapped the field location to the type geo_point, Elasticsearch is aware of what this information means and how to act upon it.

The nearest hotel problem

Let’s assume that we are at a particular point where the latitude is 1.234 and the longitude is 2.132. We need to find the nearest restaurants to this location.

For this purpose, the function_score query is the best option. We can use the decay (Gauss) functionality of the function score query to achieve this:

curl -XPOST 'http://localhost:9200/restaurants/_search' -d '{
"query": {
   "function_score": {
     "functions": [
       {
         "gauss": {
           "location": {
             "scale": "1km",
              "origin": [
               1.231,
               1.012
             ]
           }
         }
       }
     ]
   }
}
}'

Here, we tell Elasticsearch to give a higher score to the restaurants that are nearby the referral point we gave it. The closer it is, the higher is the importance.

Maximum distance covered

Now, let’s move on to another example of finding restaurants that are within 10 kms from my current position. Those that are beyond 10 kms are of no interest to me. So, it almost makes up to a circle with a radius of 10 km from my current position, as shown in the following map:

Elasticsearch Blueprints

Our best bet here is using a geo distance filter. It can be used as follows:

curl -XPOST 'http://localhost:9200/restaurants/_search' -d '{
"query": {
   "filtered": {
     "filter": {
       "geo_distance": {
         "distance": "100km",
         "location": {
           "lat": 1.232,
           "lon": 1.112
         }
       }
     }
   }
}
}'

Inside city limits

Next, I need to consider only those restaurants that are inside a particular city limit; the rest are of no interest to me. As the city shown in the following map is rectangle in nature, this makes my job easier:

Elasticsearch Blueprints

Now, to see whether a geo point is inside a rectangle, we can use the bounding box filter. A rectangle is marked when you feed the top-left point and bottom-right point.

Let’s assume that the city is within the following rectangle with the top-left point as X and Y and the bottom-right point as A and B:

curl -XPOST 'http://localhost:9200/restaurants/_search' -d '{
"query": {
   "filtered": {
     "query": {
       "match_all": {}
     },
     "filter": {
       "geo_bounding_box": {
         "location": {
           "top_left": {
             "lat": 2,
             "lon": 0
           },
           "bottom_right": {
             "lat": 0,
             "lon": 2
           }
         }
       }
     }
   }
}
}'

Distance values between the current point and each restaurant

Now, consider the scenario where you need to find the distance between the user location and each restaurant. How can we achieve this requirement? We can use scripts; the current geo coordinates are passed to the script and then the query to find the distance between each restaurant is run, as in the following code. Here, the current location is given as (1, 2):

curl -XPOST 'http://localhost:9200/restaurants/_search?pretty' -d '{
"script_fields": {
   "distance": {
     "script": "doc['"'"'location'"'"'].arcDistanceInKm(1, 2)"
   }
},
"fields": [
   "name"
],
"query": {
   "match": {
     "name": "chinese"
   }
}
}'

We have used the function called arcDistanceInKm in the preceding query, which accepts the geo coordinates and then returns the distance between that point and the locations satisfied by the query. Note that the unit of distance calculated is in kilometers (km). You might have noticed a long list of quotes and double quotes before and after location in the script mentioned previously. This is the standard format and if we don’t use this, it would result in returning the format error while processing.

The distances are calculated from the current point to the filtered hotels and are returned in the distance field of response, as shown in the following code:

{
"took" : 3,
"timed_out" : false,
"_shards" : {
   "total" : 1,
   "successful" : 1,
   "failed" : 0
},
"hits" : {
   "total" : 2,
   "max_score" : 0.7554128,
   "hits" : [ {
     "_index" : "restaurants",
     "_type" : "restaurant",
     "_id" : "AU08uZX6QQuJvMORdWRK",
     "_score" : 0.7554128,
     "fields" : {
       "distance" : [ 112.92927483176413 ],
       "name" : [ "Great chinese restaurant" ]
     }
   }, {
     "_index" : "restaurants",
     "_type" : "restaurant",
     "_id" : "AU08uZaZQQuJvMORdWRM",
     "_score" : 0.7554128,
     "fields" : {
       "distance" : [ 137.61635969665923 ],
       "name" : [ "Great chinese restaurant" ]
     }
   } ]
}
}

Note that the distances measured from the current point to the hotels are direct distances and not road distances.

Restaurant out of city limits

One of my friends called me and asked me to join him on his journey to the next city. As we were leaving the city, he was particular that he wants to eat at some restaurant off the city limits, but outside the next city. For this, the requirement was translated to any restaurant that is minimum 15 kms and a maximum of 100 kms from the center of the city. Hence, we have something like a donut in which we have to conduct our search, as show in the following map:

Elasticsearch Blueprints

The area inside the donut is a match, but the area outside is not. For this donut area calculation, we have the geo_distance_range filter to our rescue. Here, we can apply the minimum distance and maximum distance in the fields from and to to populate the results, as shown in the following code:

curl -XPOST 'http://localhost:9200/restaurants/_search' -d '{
"query": {
   "filtered": {
     "query": {
       "match_all": {}
     },
     "filter": {
       "geo_distance_range": {
         "from": "15km",
         "to": "100km",
         "location": {
           "lat": 1.232,
           "lon": 1.112
         }
       }
     }
   }
}
}'

Restaurant categorization based on distance

In an e-commerce solution, to search restaurants, it’s required that you increase the searchable characteristics of the application. This means that if we are able to give a snapshot of results other than the top-10 results, it would add to the searchable characteristics of the search. For example, if we are able to show how many restaurants serve Indian, Thai, or other cuisines, it would actually help the user to get a better idea of the result set.

In a similar manner, if we can tell them if the restaurant is near, at a medium distance, or far away, we can really pull a chord in the restaurant search user experience, as shown in the following map:

Elasticsearch Blueprints

Implementing this is not hard, as we have something called the distance range aggregation. In this aggregation type, we can handcraft the range of distance we are interested in and create a bucket for each of them. We can also define the key name we need, as shown in the following code:

curl -XPOST 'http://localhost:9200/restaurants/_search' -d '{
"aggs": {
   "distanceRanges": {
     "geo_distance": {
       "field": "location",
       "origin": "1.231, 1.012",
       "unit": "meters",
       "ranges": [
         {
           "key": "Near by Locations",
           "to": 200
         },
         {
           "key": "Medium distance Locations",
           "from": 200,
           "to": 2000
         },
         {
           "key": "Far Away Locations",
           "from": 2000
         }
       ]
     }
   }
}
}'

In the preceding code, we categorized the restaurants under three distance ranges, which are the nearby hotels (less than 200 meters), medium distant hotels (within 200 meters to 2,000 meters), and the far away ones (greater than 2,000 meters). This logic was translated to the Elasticsearch query using which, we received the results as follows:

{
"took": 44,
"timed_out": false,
"_shards": {
   "total": 1,
   "successful": 1,
   "failed": 0
},
"hits": {
   "total": 5,
   "max_score": 0,
   "hits": [
    
   ]
},
"aggregations": {
   "distanceRanges": {
     "buckets": [
       {
         "key": "Near by Locations",
         "from": 0,
         "to": 200,
         "doc_count": 1
       },
       {
         "key": "Medium distance Locations",
         "from": 200,
         "to": 2000,
       "doc_count": 0
       },
       {
         "key": "Far Away Locations",
         "from": 2000,
         "doc_count": 4
       }
     ]
   }
}
}

In the results, we received how many restaurants are there in each distance range indicated by the doc_count field.

Aggregating restaurants based on their nearness

In the previous example, we saw the aggregation of restaurants based on their distance from the current point to three different categories. Now, we can consider another scenario in which we classify the restaurants on the basis of the geohash grids that they belong to. This kind of classification can be advantageous if the user would like to get a geographical picture of how the restaurants are distributed.

Here is the code for a geohash-based aggregation of restaurants:

curl -XPOST 'http://localhost:9200/restaurants/_search?pretty' -d '{
"size": 0,
"aggs": {
   "DifferentGrids": {
     "geohash_grid": {
       "field": "location",
       "precision": 6
     },
     "aggs": {
       "restaurants": {
         "top_hits": {}
       }
     }
   }
}
}'

You can see from the preceding code that we used the geohash aggregation, which is named as DifferentGrids and the precision here, is to be set as 6. The precision field value can be varied within the range of 1 to 12, with 1 being the lowest and 12 being the highest reference of precision.

Also, we used another aggregation named restaurants inside the DifferentGrids aggregation. The restaurant aggregation uses the top_hits query to fetch the aggregated details from the DifferentGrids aggregation, which otherwise, would return only the key and doc_count values.

So, running the preceding code gives us the following result:

{
   "took":5,
   "timed_out":false,
   "_shards":{
     "total":1,
     "successful":1,
     "failed":0
   },
   "hits":{
     "total":5,
     "max_score":0,
     "hits":[
 
     ]
   },
   "aggregations":{
     "DifferentGrids":{
         "buckets":[
           {
               "key":"s009",
              "doc_count":2,
               "restaurants":{... }
           },
           {
               "key":"s01n",
               "doc_count":1,
               "restaurants":{... }
           },
           {
               "key":"s00x",
               "doc_count":1,
               "restaurants":{... }
           },
           {
               "key":"s00p",
               "doc_count":1,
               "restaurants":{... }
           }
         ]
     }
   }
}

As we can see from the response, there are four buckets with the key values, which are s009, s01n, s00x, and s00p. These key values represent the different geohash grids that the restaurants belong to. From the preceding result, we can evidently say that the s009 grid contains two restaurants inside it and all the other grids contain one each.

A pictorial representation of the previous aggregation would be like the one shown on the following map:

Elasticsearch Blueprints

Summary

We found that Elasticsearch can handle geo point and various geo-specific operations. A few geospecific and geopoint operations that we covered in this article were searching for nearby restaurants (restaurants inside a circle), searching for restaurants within a range (restaurants inside a concentric circle), searching for restaurants inside a city (restaurants inside a rectangle), searching for restaurants inside a polygon, and categorization of restaurants by the proximity. Apart from these, we can use Kibana, a flexible and powerful visualization tool provided by Elasticsearch for geo-based operations.

Resources for Article:


Further resources on this subject:


LEAVE A REPLY

Please enter your comment!
Please enter your name here