In this article by Arun Chinnachamy, the author of Redis Applied Design Patterns, we are going to see how to use Redis to build a basic autocomplete or autosuggest server. Also, we will see how to build a faceting engine using Redis. To build such a system, we will use sorted sets and operations involving ranges and intersections. To summarize, we will focus on the following topics in this article:

(For more resources related to this topic, see here.)

Autocompletion for words

Multiword autosuggestion using a sorted set

Faceted search using sets and operations such as union and intersection

Autosuggest systems

These days autosuggest is seen in virtually all e-commerce stores in addition to a host of others. Almost all websites are utilizing this functionality in one way or another from a basic website search to programming IDEs. The ease of use afforded by autosuggest has led every major website from Google and Amazon to Wikipedia to use this feature to make it easier for users to navigate to where they want to go. The primary metric for any autosuggest system is how fast we can respond with suggestions to a user's query. Usability research studies have found that the response time should be under a second to ensure that a user's attention and flow of thought are preserved. Redis is ideally suited for this task as it is one of the fastest data stores in the market right now.

Let's see how to design such a structure and use Redis to build an autosuggest engine. We can tweak Redis to suit individual use case scenarios, ranging from the simple to the complex. For instance, if we want only to autocomplete a word, we can enable this functionality by using a sorted set. Let's see how to perform single word completion and then we will move on to more complex scenarios, such as phrase completion.

Word completion in Redis

In this section, we want to provide a simple word completion feature through Redis. We will use a sorted set for this exercise. The reason behind using a sorted set is that it always guarantees O(log(N)) operations. While it is commonly known that in a sorted set, elements are arranged based on the score, what is not widely acknowledged is that elements with the same scores are arranged lexicographically. This is going to form the basis for our word completion feature. Let's look at a scenario in which we have the words to autocomplete: jack, smith, scott, jacob, and jackeline.

In order to complete a word, we need to use n-gram. Every word needs to be written as a contiguous sequence.

n-gram is a contiguous sequence of n items from a given sequence of text or speech. To find out more, check http://en.wikipedia.org/wiki/N-gram.

For example, n-gram of jack is as follows:

j
ja
jac
jack$

In order to signify the completed word, we can use a delimiter such as * or $. To add the word into a sorted set, we will be using ZADD in the following way:

> zadd autocomplete 0 j
> zadd autocomplete 0 ja
> zadd autocomplete 0 jac
> zadd autocomplete 0 jack$

Likewise, we need to add all the words we want to index for autocompletion. Once we are done, our sorted set will look as follows:

> zrange autocomplete 0 -1
1) "j"
2) "ja"
3) "jac"
4) "jack$"
5) "jacke"
6) "jackel"
7) "jackeli"
8) "jackelin"
9) "jackeline$"
10) "jaco"
11) "jacob$"
12) "s"
13) "sc"
14) "sco"
15) "scot"
16) "scott$"
17) "sm"
18) "smi"
19) "smit"
20) "smith$"

Now, we will use ZRANK and ZRANGE operations over the sorted set to achieve our desired functionality. To autocomplete for ja, we have to execute the following commands:

> zrank autocomplete jac
2
zrange autocomplete 3 50
1) "jack$"
2) "jacke"
3) "jackel"
4) "jackeli"
5) "jackelin"
6) "jackeline$"
7) "jaco"
8) "jacob$"
9) "s"
10) "sc"
11) "sco"
12) "scot"
13) "scott$"
14) "sm"
15) "smi"
16) "smit"
17) "smith$"

Another example on completing smi is as follows:

zrank autocomplete smi
17
zrange autocomplete 18 50
1) "smit"
2) "smith$"

Now, in our program, we have to do the following tasks:

Iterate through the results set.

Check if the word starts with the query and only use the words with $ as the last character.

Though it looks like a lot of operations are performed, both ZRANGE and ZRANK are O(log(N)) operations. Therefore, there should be virtually no problem in handling a huge list of words. When it comes to memory usage, we will have n+1 elements for every word, where n is the number of characters in the word. For M words, we will have M(avg(n) + 1) records where avg(n) is the average characters in a word. The more the collision of characters in our universe, the less the memory usage.

In order to conserve memory, we can use the EXPIRE command to expire unused long tail autocomplete terms.

Multiword phrase completion

In the previous section, we have seen how to use the autocomplete for a single word. However, in most real world scenarios, we will have to deal with multiword phrases. This is much more difficult to achieve as there are a few inherent challenges involved:

Suggesting a phrase for all matching words. For instance, the same manufacturer has a lot of models available. We have to ensure that we list all models if a user decides to search for a manufacturer by name.

Order the results based on overall popularity and relevance of the match instead of ordering lexicographically. The following screenshot shows the typical autosuggest box, which you find in popular e-commerce portals. This feature improves the user experience and also reduces the spell errors:

Unlock access to the largest independent learning library in Tech for FREE!

Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.

Renews at $19.99/month. Cancel anytime

For this case, we will use a sorted set along with hashes. We will use a sorted set to store the n-gram of the indexed data followed by getting the complete title from hashes. Instead of storing the n-grams into the same sorted set, we will store them in different sorted sets.

Let's look at the following scenario in which we have model names of mobile phones along with their popularity:

redis-autosuggest-img-1

For this set, we will create multiple sorted sets. Let's take Apple iPhone 5S:

ZADD a 9 apple_iphone_5s
ZADD ap 9 apple_iphone_5s
ZADD app 9 apple_iphone_5s
ZADD apple 9 apple_iphone_5s
ZADD i 9 apple_iphone_5s
ZADD ip 9 apple_iphone_5s
ZADD iph 9 apple_iphone_5s
ZADD ipho 9 apple_iphone_5s
ZADD iphon 9 apple_iphone_5s
ZADD iphone 9 apple_iphone_5s
ZADD 5 9 apple_iphone_5s
ZADD 5s 9 apple_iphone_5s
HSET titles apple_iphone_5s "Apple iPhone 5S"

In the preceding scenario, we have added every n-gram value as a sorted set and created a hash that holds the original title. Likewise, we have to add all the titles into our index.

Searching in the index

Now that we have indexed the titles, we are ready to perform a search. Consider a situation where a user is querying with the term apple. We want to show the user the five best suggestions based on the popularity of the product. Here's how we can achieve this:

> zrevrange apple 0 4 withscores
1) "apple_iphone_5s"
2) 9.0
3) "apple_iphone_5c"
4) 6.0

As the elements inside the sorted set are ordered by the element score, we get the matches ordered by the popularity which we inserted. To get the original title, type the following command:

> hmget titles apple_iphone_5s
1) "Apple iPhone 5S"

In the preceding scenario case, the query was a single word. Now imagine if the user has multiple words such as Samsung nex, and we have to suggest the autocomplete as Samsung Galaxy Nexus. To achieve this, we will use ZINTERSTORE as follows:

> zinterstore samsung_nex 2 samsung nex aggregate max

ZINTERSTORE destination numkeys key [key ...] [WEIGHTS weight [weight ...]] [AGGREGATE SUM|MIN|MAX]

This computes the intersection of sorted sets given by the specified keys and stores the result in a destination. It is mandatory to provide the number of input keys before passing the input keys and other (optional) arguments. For more information about ZINTERSTORE, visit http://redis.io/commands/ZINTERSTORE.

The previous command, which is zinterstore samsung_nex 2 samsung nex aggregate max, will compute the intersection of two sorted sets, samsung and nex, and stores it in another sorted set, samsung_nex. To see the result, type the following commands:

> zrevrange samsung_nex 0 4 withscores
1) samsung_galaxy_nexus
2) 7
> hmget titles samsung_galaxy_nexus
1) Samsung Galaxy Nexus

If you want to cache the result for multiword queries and remove it automatically, use an EXPIRE command and set expiry for temporary keys.

Summary

In this article, we have seen how to perform autosuggest and faceted searches using Redis. We have also understood how sorted sets and sets work. We have also seen how Redis can be used as a backend system for simple faceting and autosuggest system and make the system ultrafast.

Further resources on this subject: