(For more resources related to this topic, see here.)

Preparing the environment

Before jumping into configuring and setting up the cluster network, we have to check some parameters and prepare the environment.

To enable sharding for a database or collection, we have to configure some configuration servers that hold the cluster network metadata and shards information. Other parts of the cluster network use these configuration servers to get information about other shards.

In production, it's recommended to have exactly three configuration servers in different machines. The reason for establishing each shard on a different server is to improve the safety of data and nodes. If one of the machines crashes, the whole cluster won't be unavailable.

For the testing and developing environment, you can host all the configuration servers on a single server.Besides, we have two more parts for our cluster network, shards and mongos, or query routers. Query routers are the interface for all clients. All read/write requests are routed to this module, and the query router or mongos instance, using configuration servers, route the request to the corresponding shard.

The following diagram shows the cluster network, modules, and the relation between them:

sharding-action-img-0

It's important that all modules and parts have network access and are able to connect to each other. If you have any firewall, you should configure it correctly and give proper access to all cluster modules.

Each configuration server has an address that routes to the target machine. We have exactly three configuration servers in our example, and the following list shows the hostnames:

cfg1.sharding.com

cfg2.sharding.com

cfg3.sharding.com

In our example, because we are going to set up a demo of sharding feature, we deploy all configuration servers on a single machine with different ports. This means all configuration servers addresses point to the same server, but we use different ports to establish the configuration server.

For production use, all things will be the same, except you need to host the configuration servers on separate machines.

In the next section, we will implement all parts and finally connect all of them together to start the sharding server and run the cluster network.

Implementing configuration servers

Now it's time to start the first part of our sharding. Establishing a configuration server is as easy as running a mongod instance using the --configsvr parameter.

The following scheme shows the structure of the command:

mongod --configsvr --dbpath <path> --port <port>

If you don't pass the dbpath or port parameters, the configuration server uses /data/configdb as the path to store data and port 27019 to execute the instance. However, you can override the default values using the preceding command.

If this is the first time that you have run the configuration server, you might be faced with some issues due to the existence of dbpath. Before running the configuration server, make sure that you have created the path; otherwise, you will see an error as shown in the following screenshot:

sharding-action-img-1

You can simply create the directory using the mkdir command as shown in the following line of command:

mkdir /data/configdb

Also, make sure that you are executing the instance with sufficient permission level; otherwise, you will get an error as shown in the following screenshot:

sharding-action-img-2

The problem is that the mongod instance can't create the lock file because of the lack of permission. To address this issue, you should simply execute the command using a root or administrator permission level.

After executing the command using the proper permission level, you should see a result like the following screenshot:

sharding-action-img-3

As you can see now, we have a configuration server for the hostname cfg1.sharding.com with port 27019 and with dbpath as /data/configdb.

Also, there is a web console to watch and control the configuration server running on port 28019. By pointing the web browser to the address http://localhost:28019/, you can see the console.

The following screenshot shows a part of this web console:

sharding-action-img-4

Now, we have the first configuration server up and running. With the same method, you can launch other instances, that is, using /data/configdb2 with port 27020 for the second configuration server, and /data/configdb3 with port 27021 for the third configuration server.

Configuring mongos instance

After configuring the configuration servers, we should bind them to the core module of clustering. The mongos instance is responsible to bind all modules and parts together to make a complete sharding core.

This module is simple and lightweight, and we can host it on the same machine that hosts other modules, such as configuration servers. It doesn't need a separate directory to store data. The mongos process uses port 27017 by default, but you can change the port using the configuration parameters.

To define the configuration servers, you can use the configuration file or command-line parameters. Create a new file using your text editor in the /etc/ directory and add the following configuring settings:

configdb = cfg1.sharding.com:27019, cfg2.sharding.com:27020
cfg3.sharding.com:27021

To execute and run the mongos instance, you can simply use the following command:

mongos -f /etc/mongos.conf

After executing the command, you should see an output like the following screenshot:

sharding-action-img-5

Please note that if you have a configuration server that has been already used in a different sharding network, you can't use the existing data directory. You should create a new and empty data directory for the configuration server.

Currently, we have mongos and all configuration servers that work together pretty well. In the next part, we will add shards to the mongos instance to complete the whole network.

Managing mongos instance

Now it's time to add shards and split whole dataset into smaller pieces. For production use, each shard should be a replica set network, but for the development and testing environment, you can simply add a single mongod instances to the cluster.

To control and manage the mongos instance, we can simply use the mongo shell to connect to the mongos and execute commands. To connect to the mongos, you use the following command:

mongo --host <mongos hostname> --port <mongos port>

For instance, our mongos address is mongos1.sharding.com and the port is 27017. This is depicted in the following screenshot:

sharding-action-img-6

After connecting to the mongos instance, we have a command environment, and we can use it to add, remove, or modify shards, or even get the status of the entire sharding network.

Using the following command, you can get the status of the sharding network:

sh.status()

The following screenshot illustrates the output of this command:

sharding-action-img-7

Because we haven't added any shards to sharding, you see an error that says there are no shards in the sharding network.

Using the sh.help() command, you can see all commands as shown in the following screenshot:

sharding-action-img-8

Using the sh.addShard() function, you can add shards to the network.

Adding shards to mongos

After connecting to the mongos, you can add shards to sharding. Basically, you can add two types of endpoints to the mongos as a shard; replica set or a standalone mongod instance.

MongoDB has a sh namespace and a function called addShard(), which is used to add a new shard to an existing sharding network. Here is the example of a command to add a new shard. This is shown in the following screenshot:

sharding-action-img-9

To add a replica set to mongos you should follow this scheme:

setname/server:port

For instance, if you have a replica set with the name of rs1, hostname mongod1.replicaset.com, and port number 27017, the command will be as follows:

sh.addShard("rs1/mongod1.replicaset.com:27017")

Using the same function, we can add standalone mongod instances. So, if we have a mongod instance with the hostname mongod1.sharding.com listening on port 27017, the command will be as follows:

sh.addShard("mongod1.sharding.com:27017")

You can use a secondary or primary hostname to add the replica set as a shard to the sharding network. MongoDB will detect the primary and use the primary node to interact with sharding.

Now, we add the replica set network using the following command:

sh.addShard("rs1/mongod1.replicaset.com:27017")

If everything goes well, you won't see any output from the console, which means the adding process was successful. This is shown in the following screenshot:

sharding-action-img-10

To see the status of sharding, you can use the sh.status() command. This is demonstrated in the following screenshot:

sharding-action-img-11

Next, we will establish another standalone mongod instance and add it to sharding. The port number of mongod is 27016 and the hostname is mongod1.sharding.com.

The following screenshot shows the output after starting the new mongod instance:

sharding-action-img-12

Using the same approach, we will add the preceding node to sharding. This is shown in the following screenshot:

sharding-action-img-13

It's time to see the sharding status using the sh.status() command:

sharding-action-img-14

As you can see in the preceding screenshot, now we have two shards. The first one is a replica set with the name rs1, and the second shard is a standalone mongod instance on port 27016.

If you create a new database on each shard, MongoDB syncs this new database with the mongos instance. Using the show dbs command, you can see all databases from all shards as shown in the following screenshot:

sharding-action-img-15

The configuration database is an internal database that MongoDB uses to configure and manage the sharding network.

Currently, we have all sharding modules working together. The last and final step is to enable sharding for a database and collection.

Summary

In this article, we prepared the environment for sharding of a database. We also learned about the implementation of a configuration server. Next, after configuring the configuration servers, we saw how to bind them to the core module of clustering.