(For more resources related to this topic, see here.)
The key strategy
If we look closely for the purpose of load balancing, we will see three components in an ownCloud instance, which are as follows:
- A user data storage (till now we were using system hard disk)
- A web server, for example Apache or IIS
- A database, MySQL would be a good choice for demonstration
The user data storage
Whenever user creates any file or directory in ownCloud or uploads something, the data gets stored in the data directory. If we have to ensure that our ownCloud instance is capable to store the data then we have to make this redundant. Lucky for us, ownCloud supports a lot of other options out of the box, other than the local disk storage. We can use a Samba backend, an ftp backend, an OpenStack Swift backend, Amazon S3, Web DAV, and a lot more.
Configuring WebDAV
Web Distributed Authoring and Versioning (WebDAV) is an extension of HTTP. It is described by the IETF in RFC 4918 at http://tools.ietf.org/html/rfc4918. It provides the functionality of editing and managing documents over the web. It essentially makes the web readable and writable.
To enable custom backend support, we will first have to go to the Familiar Apps section, and need to enable the External Storage Support app. After this app is enabled, when we open the ownCloud admin panel, we will see an external storage section on the page. Just choose WebDAV from the drop-down menu and fill in the credentials. Choose mount point as 0 and put the root as $user/. We are doing this so that for each user, a directory will be created on the WebDAV with their username and whenever users log in, they will be sent to this directory. Just to verify, check out the config/mount.php fi le for ownCloud.
The web server
Assuming that we have taken care of backend storage, let’s now handle the frontend web server. A very obvious way is to do the DNS level load balancing by round robin or geographical distribution. In round-robin DNS scheme the resolution of a name returns a list IP addresses instead of a single IP. These IP addresses may be returned in the round-robin fashion, which means that every time the IP addresses will be permuted in the list. This helps in distribution of the traffic since usually the first IP is used. Another way to give out the list is to match the IP address of the client to the closest IP in the list, and then make that the first IP in the response of the DNS query. The biggest advantage of DNS-based load distribution is that it is application agnostic. It does not care if the request is for an Apache server running PHP or an IIS server running ASP. It just rotates the IP, and the server is responsible to handle the request appropriately.
So far, it sounds all good but then why don’t we use it all the time? Is it sufficient to balance the entire load? Well, this strategy is great for load distribution, but what will happen in case one of the servers fails? We will run into a major problem then, because usually DNS servers do not do health checks. So in case one if our servers fail, we have to either fix it very fast, which is not easy always or we have to remove that IP from the DNS, but then the DNS answers are cached by several intermediate caching (only DNS servers). They will continue to serve the stale IPs and our clients will continue visiting bad server.
Another way is to move the IP from the bad server to the good server. So now this good server will have two IP addresses. That means that it has to handle twice the load, since DNS will keep on sending traffic after permuting the IPs in round-robin fashion.
Due to these and several other problems with DNS level load balancing, we generally either avoid using it or use it along with other load-balancing mechanisms.
Load balancing Apache is quite easy using Windows GUI
For the sake of this example, let’s assume that we have ownCloud served by two Apache web servers at 192.168.10.10 and 192.168.10.11. Starting with Apache 2.1, a module known as mod_proxy_balancer was introduced. For CentOS, the default apache package ships this module with itself, so installing is not a problem. If we have Apache running from the yum repo, then we already have this module with us. Now, mod_proxy_balancer supports three algorithms for load distribution, which are as follows:
Request Counting
With this algorithm, incoming requests are distributed among backend workers in such a way that each backend gets a proportional number of requests defined in the configuration by the loadfactor variable. For example, consider this Apache config snippet:
<Proxy balancer://ownCloud>
BalancerMember http://192.168.10.11/ loadfactor=1 # Balancer member 1
BalancerMember http://192.168.10.10/ loadfactor=3 # Balancer member 2
ProxySet lbmethod=byrequests
</Proxy>
In this example, one request out of every four will be sent to 192.168.10.11, and three will be sent to 192.168.10.10. This might be an appropriate configuration for a site with two servers, one of which is more powerful than the other.
Weighted Traffic Counting
The Weighted Traffic Counting algorithm is similar to Request Counting algorithm with a minor difference, that is, Weighted Traffic Counting considers the number of bytes instead of number of requests. In the following configuration example, the number of bytes processed by 192.168.10.10 will be three times that of 192.168.10.11:
<Proxy balancer://ownCloud>
BalancerMember http://192.168.10.11/ loadfactor=1 # Balancer member 1
BalancerMember http://192.168.10.10/ loadfactor=3 # Balancer member 2
ProxySet lbmethod=bytraffic
</Proxy>
Pending Request Counting
The Pending Request Counting algorithm is the latest and the most sophisticated algorithm provided by Apache for load balancing. It is available from Apache 2.2.10 onward.
In this algorithm, the scheduler keeps track of the number of requests that are assigned to each backend worker at any given time. Each new incoming request will be sent to the backend that has a least number of pending requests. In other words, to the backend worker that is relatively least loaded. This helps in keeping the request queues even among the backend workers, and each request generally goes to the worker that can process it the fastest.
If two workers are equally light-loaded, the scheduler uses the Request Counting algorithm to break the tie, which is as follows:
<Proxy balancer://ownCloud>
BalancerMember http://192.168.10.11/ # Balancer member 1
BalancerMember http://192.168.10.10/ # Balancer member 2
ProxySet lbmethod=bybusyness
</Proxy>
Enable the Balancer Manager
Sometimes, we may need to change our load balancing configuration, but that may not be easy to do without affecting the running servers. For such situations, the Balancer Manager module provides a web interface to change the status of backend workers on the fly. We can use Balancer Manager to put a worker in offline mode or change its loadfactor, but we must have mod_status installed in order to use Balance Manager. A sample config, which should be defined in /etc/httpd/httpd.conf, might look similar to the following code:
<Location /balancer-manager>
SetHandler balancer-manager
Order Deny,Allow
Deny from all
Allow from .owncloudbook.com
</Location>
Once we add directives similar to the preceding ones to httpd.conf, and then restart Apache, we can open the Balancer Manager by pointing a browser at http://owncloudbook.com/balancer-manager.
Load balancing IIS
Load balancing IIS quite easily uses Windows GUI. Windows Server editions come with a set of nifty tools for this known as Network Load Balancer(NLB). It balances the load by distributing incoming requests among a cluster of servers. Each server in a cluster emits a heartbeat, a kind of “I am operational” message. NLB ensures that no request goes to a server which is not sending this heartbeat, thereby ensuring that all that the requests are processed by operational servers.
Let’s now configure the NLB by performing the following steps:
- We need to turn it on first. We can do so by following the given steps:
- Go to Server Manager.
- Click on the Features section in the left-side bar.
- Then click on the Add Features.
- Select Network Load Balancing from the list.
- Once we have chosen Network Load Balancing, we will click on Next >, and then click on the Install to get this feature on the servers. Once we are done here, we will open Network Load Balancing Manager from the Administrative Tools section in the Start menu. In the manager window, we need to right-click on the Network Load Balancing Clusters option to create a new cluster, as shown in the following screenshot:
- Now we need to give the address of the server which is actually running the web server, and then connect to it, as shown in the following screenshot:
- Choose the appropriate interface. In this example, we have only one, and then click on the Next > button. On the next window, we will be shown host parameters, where we have to assign a priority to this host, as shown in the following screenshot:
- Now click on the Add button, and a dialogue will open where we have to assign an IP, which will be shared by all the hosts, as shown in the following screenshot.(Network Load Balancing Manager will configure this IP on all the machines.)
- On the next dialogue choose a cluster IP, as shown in the following screenshot. This will be the IP, which will be used by the users to log in to the ownCloud.
- Now that we have given it an IP, we will define cluster parameters to use unicast. Multicasts and broadcasts can be used, but they are not supported by all vendors and require more effort.
Now everything is done. We are ready to use the Network Load Balancing feature.
These steps are to be repeated on all the machines which are going to be a part of this cluster.
So there! We have also loaded balanced IIS.
The MySQL database
MySQL Cluster is a separate component of MySQL, which is not shipped with the standard MySQL server but can be downloaded freely from http://dev.mysql.com/downloads/cluster/. MySQL Cluster helps in better scalability and ensuring high uptime. It is write scalable and ACID compliant, and doesn’t have a single disadvantage because of the way it is designed with multi masters and high distribution of data. This is perfect for our requirements, so let’s start with its installation.
Basic terminologies
- Management node: This node performs the basic management functions. It starts and stops other nodes and performs backup. It is always a good idea to start this node before starting anything else in the cluster.
- Data node: This node will store the cluster data. They should always be more than one to provide redundancy.
- SQL node: This node accesses the cluster data. It uses the NDBCLUSTER storage engine.
The default MySQL server does not ship with the NDBCLUSTER storage engine and other required features. So it is mandatory to download a server binary, which can support MySQL Cluster feature. We have to download the appropriate source for MySQL Cluster from http://dev.mysql.com/downloads/cluster/, if Linux is the host OS or the binary if Windows is in consideration.
For the purpose of this demonstration, we will assume one Management node, one SQL node, and two Data nodes. We will also make a note that node is a logical word here. It need not be a physical machine. In fact, they can reside on the same machine as separate processes, but then the whole purpose of high availability will be defeated.
Let’s start by installing the MySQL cluster nodes.
Data node
Setting up Data node is fairly simple. Just copy the ndbd and ndbmtd binaries from the bin directory of the archive to /usr/loca/bin/ and make them executable as follows:
cp bin/ndbd /usr/local/bin/ndbd
cp bin/ndbmtd /usr/local/bin/ndbmtd
chmod +x bin/ndbd /usr/local/bin/ndbd
chmod +x bin/ndbmtd /usr/local/bin/ndbmtd
Management node
Management node needs only two binaries, ndb_mgmd and ndb_mgm
cp bin/ndb_mgm* /usr/local/bin
chmod +x /usr/local/bin/ndb_mgm*
SQL node
First of all, we need to create a user for MySQL as follows:
useradd mysql
Now extract the tar.gz archive file downloaded before. Conventionally, MySQL documentation uses /usr/local/ directory to unpack the archive, but it can be done anywhere. We’ll follow MySQL conventions here and also create a symbolic link to ease the access and better manageability as follows:
tar -C /usr/local -xzvf mysql-cluster-gpl-7.2.12-linux2.6.tar.gz
ln -s /usr/local/mysql-cluster-gpl-7.2.12-linux2.6-i686 /usr/local/mysql
We need to set write permissions for MySQL user, which we created before, as follows:
chown -R root /usr/local/mysql
chown -R mysql /usr/local/mysql/data
chgrp -R mysql /usr/local/mysql
The preceding commands will ensure that the permission to start and stop the MySQL instance’s remains with the root user, but MySQL user can write data to the data directory.
Now, change the directory to the scripts directory and create the system databases as follows:
scripts/mysql_install_db --user=mysql
Configuring the Data node and SQL node
We can configure the Data node and SQL node as follows:
vim /etc/my.cnf
[mysqld]
# Options for mysqld process:
ndbcluster # run NDB storage engine
[mysql_cluster]
# Options for MySQL Cluster processes:
ndb-connectstring=192.168.20.10 # location of management server
Configuring the Management node
We can configure the Management node as follows:
vim /var/lib/mysql-cluster/config.ini
[ndbd default]
# Options affecting ndbd processes on all data nodes:
NoOfReplicas=2 # Number of replicas
DataMemory=200M # How much memory to allocate for data storage
IndexMemory=50M # How much memory to allocate for index storage
# For DataMemory and IndexMemory, we have used the
# default values. Since the "world" database takes up
# only about 500KB, this should be more than enough for
# this example Cluster setup.
[tcp default]
# TCP/IP options:
portnumber=2202
[ndb_mgmd]
# Management process options:
hostname=192.168.20.10 # Hostname or IP address of MGM node
datadir=/var/lib/mysql-cluster # Directory for MGM node log files
[ndbd]
# Options for data node "A":
# (one [ndbd] section per data node)
hostname=192.168.20.12 # Hostname or IP address
datadir=/usr/local/mysql/data # Directory for this data node's data
files
[ndbd]
# Options for data node "B":
hostname=192.168.0.40 # Hostname or IP address
datadir=/usr/local/mysql/data # Directory for this data node's data
files
[mysqld]
# SQL node options:
hostname=192.168.20.11 # Hostname or IP address
Summary
Now we have gained an idea about how to ensure high availability of ownCloud server components. We have seen the load balancing for backend data store as well as frontend web server, and the database. We have seen some common ways and we can now provide a reliable ownCloud service to our users.
Resources for Article:
Further resources on this subject:
- Introduction to Cloud Computing with Microsoft Azure [Article]
- Cross-premise Connectivity [Article]
- Cloud-enabling Your Apps [Article]