|Read more about this book|
(For more resources on this subject, see here.)
Replication is an interesting feature of MySQL that can be used for a variety of purposes. It can help to balance server load across multiple machines, ease backups, provide a workaround for the lack of fulltext search capabilities in InnoDB, and much more.
The basic idea behind replication is to reflect the contents of one database server (this can include all databases, only some of them, or even just a few tables) to more than one instance. Usually, those instances will be running on separate machines, even though this is not technically necessary.
Traditionally, MySQL replication is based on the surprisingly simple idea of repeating the execution of all statements issued that can modify data—not SELECT—against a single master machine on other machines as well. Provided all secondary slave machines had identical data contents when the replication process began, they should automatically remain in sync. This is called Statement Based Replication (SBR).
With MySQL 5.1, Row Based Replication (RBR) was added as an alternative method for replication, targeting some of the deficiencies SBR brings with it. While at first glance it may seem superior (and more reliable), it is not a silver bullet—the pain points of RBR are simply different from those of SBR.
Even though there are certain use cases for RBR, all recipes in this chapter will be using Statement Based Replication.
While MySQL makes replication generally easy to use, it is still important to understand what happens internally to be able to know the limitations and consequences of the actions and decisions you will have to make. We assume you already have a basic understanding of replication in general, but we will still go into a few important details.
Statement Based Replication
SBR is based on a simple but effective principle: if two or more machines have the same set of data to begin with, they will remain identical if all of them execute the exact same SQL statements in the same order.
Executing all statements manually on multiple machines would be extremely tedious and impractical. SBR automates this process. In simple terms, it takes care of sending all the SQL statements that change data on one server (the master) to any number of additional instances (the slaves) over the network.
The slaves receiving this stream of modification statements execute them automatically, thereby effectively reproducing the changes the master machine made to its data originally. That way they will keep their local data files in sync with the master’s.
One thing worth noting here is that the network connection between the master and its slave(s) need not be permanent. In case the link between a slave and its master fails, the slave will remember up to which point it had read the data last time and will continue from there once the network becomes available again.
In order to minimize the dependency on the network link, the slaves will retrieve the binary logs (binlogs) from the master as quickly as they can, storing them on their local disk in files called relay logs. This way, the connection, which might be some sort of dial-up link, can be terminated much sooner while executing the statements from the local relay-log asynchronously. The relay log is just a copy of the master’s binlog.
The following image shows the overall architecture:
In the image you can see that each slave may have its individual configuration on whether it executes all the statements coming in from the master, or just a selection of those. This can be helpful when you have some slaves dedicated to special tasks, where they might not need all the information from the master.
All of the binary logs have to be sent to each slave, even though it might then decide to throw away most of them. Depending on the size of the binlogs, the number of slaves and the bandwidth of the connections in between, this can be a heavy burden on the network, especially if you are replicating via wide area networks.
Even though the general idea of transferring SQL statements over the wire is rather simple, there are lots of things that can go wrong, especially because MySQL offers some configuration options that are quite counter-intuitive and lead to hard-to-find problems.
For us, this has become a best practice:
“Only use qualified statements and replicate-*-table configuration options for intuitively predictable replication!”
What this means is that the only filtering rules that produce intuitive results are those based on the replicate-do-table and replicate-ignore-table configuration options. This includes those variants with wildcards, but specifically excludes the all-database options like replicate-do-db and replicate-ignore-db. These directives are applied on the slave side on all incoming relay logs.
The master-side binlog-do-* and binlog-ignore-* configuration directives influence which statements are sent to the binlog and which are not. We strongly recommend against using them, because apart from hard-to-predict results they will make the binlogs undesirable for server backup and restore. They are often of limited use anyway as they do not allow individual configurations per slave but apply to all of them.
Setting up automatically updated slaves of a server based on a SQL dump
In this recipe, we will show you how to prepare a dump file of a MySQL master server and use it to set up one or more replication slaves. These will automatically be updated with changes made on the master server over the network.
You will need a running MySQL master database server that will act as the replication master and at least one more server to act as a replication slave. This needs to be a separate MySQL instance with its own data directory and configuration. It can reside on the same machine if you just want to try this out. In practice, a second machine is recommended because this technique’s very goal is to distribute data across multiple pieces of hardware, not place an even higher burden on a single one.
For production systems you should pick a time to do this when there is a lighter load on the master machine, often during the night when there are less users accessing the system. Taking the SQL dump uses some extra resources, but unless your server is maxed out already, the performance impact usually is not a serious problem. Exactly how long the dump will take depends mostly on the amount of data and speed of the I/O subsystem.
You will need an administrative operating system account on the master and the slave servers to edit the MySQL server configuration files on both of them. Moreover, an administrative MySQL database user is required to set up replication.
We will just replicate a single database called sakila in this example.
Replicating more than one database
In case you want to replicate more than one schema, just add their names to the commands shown below. To replicate all of them, just leave out any database name from the command line.
How to do it…
- At the operating system level, connect to the master machine and open the MySQL configuration file with a text editor. Usually it is called my.ini on Windows and my.cnf on other operating systems.
- On the master machine, make sure the following entries are present and add them to the [mysqld] section if not already there:
If one or both entries already exist, do not change them but simply note their values. The log-bin setting need not have a value, but can stand alone as well.
- Restart the master server if you need to modify the configuration.
- Create a user account on the master that can be used by the slaves to connect:
master> grant replication slave on *.* to 'repl'@'%' identified by
- Using the mysqldump tool included in the default MySQL install, create the initial copy to set up the slave(s):
$ mysqldump -uUSER -pPASS --master-data --single-transaction
sakila > sakila_master.sql
- Transfer the sakila_master.sql dump file to each slave you want to set up, for example, by using an external drive or network copy.
- On the slave, make sure the following entries are present and add them to the [mysqld] section if not present:
- Restart the slave server.
- Connect to the slave server and issue the following commands (assuming the data dump was stored in the /tmp directory):
slave> create database sakila;
slave> use sakila;
slave> source /tmp/sakila_master.sql;
slave> CHANGE MASTER TO master_host='master.example.com',
master_port=3306, master_ user='repl',
slave> START SLAVE;
- Verify the slave is running with:
slave> SHOW SLAVE STATUSG
************************** 1. row ***************************
When adding more than one slave, make sure the server-id setting is unique among master and all clients.
How it works…
Some of the instructions discussed in the previous section are to make sure that both master and slave are configured with different server-id settings. This is of paramount importance for a successful replication setup. If you fail to provide unique server-id values to all your server instances, you might see strange replication errors that are hard to debug.
Moreover, the master must be configured to write binlogs—a record of all statements manipulating data (this is what the slaves will receive).
Before taking a full content dump of the sakila demo database, we create a user account for the slaves to use. This needs the REPLICATION SLAVE privilege.
Then a data dump is created with the mysqldump command line tool. Notice the provided parameters –master-data and –single-transaction. The former is needed to have mysqldump include information about the precise moment the dump was created in the resulting output. The latter parameter is important when using InnoDB tables, because only then will the dump be created based on a transactional snapshot of the data. Without it, statements changing data while the tool was running could lead to an inconsistent dump.
The output of the command is redirected to the /tmp/sakila_master.sql file. As the sakila database is not very big, you should not see any problems. However, if you apply this recipe to larger databases, make sure you send the data to a volume with sufficient free disk space—the SQL dump can become quite large. To save space here, you may optionally pipe the output through gzip or bzip2 at the cost of a higher CPU load on both the master and the slaves, because they will need to unpack the dump before they can load it, of course.
If you open the uncompressed dump file with an editor, you will see a line with a CHANGE MASTER TO statement. This is what –master-data is for. Once the file is imported on a slave, it will know at which point in time (well, rather at which binlog position) this dump was taken. Everything that happened on the master after that needs to be replicated.
Finally, we configure that slave to use the credentials set up on the master before to connect and then start the replication. Notice that the CHANGE MASTER TO statement used for that does not include the information about the log positions or file names because that was already taken from the dump file just read in.
From here on the slave will go ahead and record all SQL statements sent from the master, store them in its relay logs, and then execute them against the local data set.
This recipe is very important because the following recipes are based on this! So in case you have not fully understood the above steps yet, we recommend you go through them again, before trying out more complicated setups.