Redis Cluster Client Development

In this post, I'd like to discuss how to write a client for a Redis Cluster a little deeper. The assumption here is that readers have a working knowledge of the Redis protocol, and therefore we will talk about how to deal with data sharding/resharding in a Redis Cluster.

As is known, each Redis instance in a cluster serves a number of non-overlapping data slots. This means two basic things:

There's a way to know how the cluster shards its dataset, which is the CLUSTER NODES command.
If you send a data command such as GETSET to a Redis instance that doesn't contain the slot of the command key, you will get a MOVED error, so you need to keep the sharding information up to date.

A CLUSTER NODES command to each Redis instance in a cluster will bring you equivalent sharding and replication information and can help you know who serves what. For example, the result could be like this:

33e5b1010d56d328370a993cf8420d70b3052d2c 127.0.0.1:7004 slave 71568798b4f1edeb0577614e072b492173b63808 0 1444704850437 2 connected
71568798b4f1edeb0577614e072b492173b63808 127.0.0.1:7002 master - 0 1444704848935 2 connected 3000-8499
f34b1ee569f3b856a80633a0b68892a58bfc87d2 127.0.0.1:7001 slave 1141f524b3ac82bb47bb101e0f8ae75ed56cdf54 0 1444704850938 1 connected
1141f524b3ac82bb47bb101e0f8ae75ed56cdf54 127.0.0.1:7003 master - 0 1444704849436 1 connected 0-2999 8500-10499
026f1e9b8c205de0a3645fd8a5eae3fbd0ca8639 127.0.0.1:7005 master - 0 1444704850838 3 connected 10500-16383
79ef081c431bbc7edc305a9611278be0df8fcd04 127.0.0.1:7000 myself,slave 026f1e9b8c205de0a3645fd8a5eae3fbd0ca8639 0 0 1 connected

By simple splitting of each line by the space character, there are at least eight items a line. The first four items are node id, address, flags, and master node id. A node with the myself flag is the one receiving the CLUSTER NODES command. If a node has the slave flag, its master node ID, which won't be a dash, indicates who it has replicated. A node with a fail, fail?, or handshake flag is not connectible from others and may probably fail to respond, so you can just ignore it. Then let's see the items after the first eight, which indicate the slots held by the node. The slots' ranges are closed intervals; for example, 3000-8499 means 3000, 3001, ..., 8498, 8499, so the node at 127.0.0.1:7002 serves 5,500 slots. And the ranges could be separated so that the slots' ranges 127.0.0.1:7003 contained are 0-2999 and 8500-10499.

After parsing the cluster information, you may know which master node a data command belongs to. If you don't mind getting possibly stale data, you can increase the QPS by sending reading commands to a slave. A hint is that you need to send a READONLY command before you can read data from the slave Redis.

It's a good start if your client library is able to parse static sharding information. Since the Redis Cluster may reshard, you need to know what happens when Redis instances join or quit the cluster. For example, after a Redis that binds 127.0.0.1:8000 joined and slot #0 is migrating to it from 127.0.0.1:7003. However, your client probably has no idea about that and sends a command such as GET h-893 (h-893 is in slot #0) to 127.0.0.1:7003 as usual. Then the result could be one of the following:

Normal response if the slot is in migrating state and the key hits the cache at 127.0.0.1:7003.
An -ASK [ADDRESS] error if the slot is in migrating state and the key doesn't hit the cache (non-existent or just migrated).
A -MOVED [SLOT#] [ADDRESS] error if the slot is completely migrated to the new instance.

In case #2, you may need to retry the command later, and in either of the other two cases, you need to update the sharding information. You can simply redirect the command to the new address according to what the MOVED error told you, or update the complete slots mapping.

When a Redis instance quits the cluster and become a free node, it won't reset your TCP connections, but will lways reply with a -CLUSTERDOWN error for any data commands. In this case, don't panic and try doing a sharding information update from other Redis instances (so that your client has to remember all the instances in case of it).

But if you think it over, you may realize that there's a chance that the Redis can join another cluster after it quits the old one, and it happens to serve the same slots it used to. Therefore, you client may work normally but data actually goes somewhere else. So, your client ought to provide an approach for an initiative sharding update.

As we have discussed normal cases, what happens when some Redis has been killed? This won't be so complicated if the slaves fail over their disconnected masters and eventually the cluster state turns to OK (otherwise, ask your Redis administrator to fix it immediately). Your client will encounter an I/O exception of TCP connection reset when the Redis server is down or some error is writing to the socket. Anyway, it's time to do a sharding information update.

These are the things that a client for a Redis Cluster should cover. Hopefully, this helps you if you decide to write a cluster client, or provides you with valuable information if you want to learn more details about your client lib (provided you are already using a Redis Cluster).

About the author

Zhe Lin is a system engineer at the IT Developing Center at Hunantv.com. He is in charge of Redis related toolkit developing, maintenance and in-company training. Major works include:

* [redis-trib.py] : Redis Cluster creating and resharding toolkit in Python2
* [redis-cerberus] : Redis Cluster proxy
* [redis-ctl] : Redis management tool with web UI