Our article focuses on optimization for MySQL 8 database servers and clients, we start with optimizing the server, followed by optimizing MySQL 8 client-side entities. It is more relevant to database administrators, to ensure performance and scalability across multiple servers. It would also help developers prepare scripts (which includes setting up the database) and users run MySQL for development and testing to maximize the productivity.
Optimizing disk I/O
There are quite a few ways to configure storage devices to devote more and faster storage hardware to the database server. A major performance bottleneck is disk seeking (finding the correct place on the disk to read or write content). When the amount of data grows large enough to make caching impossible, the problem with disk seeds becomes apparent. We need at least one disk seek operation to read, and several disk seek operations to write things in large databases where the data access is done more or less randomly. We should regulate or minimize the disk seek times using appropriate disks. In order to resolve the disk seek performance issue, increasing the number of available disk spindles, symlinking the files to different disks, or stripping disks can be done. The following are the details:
- Using symbolic links: When using symbolic links, we can create a Unix symbolic links for index and data files. The symlink points from default locations in the data directory to another disk in the case of MyISAM tables. These links may also be striped. This improves the seek and read times. The assumption is that the disk is not used concurrently for other purposes. Symbolic links are not supported for InnoDB tables. However, we can place InnoDB data and log files on different physical disks.
- Striping: In striping, we have many disks. We put the first block on the first disk, the second block on the second disk, and so on. The N block on the (N % number of-disks) disk. If the stripe size is perfectly aligned, the normal data size will be less than the stripe size. This will help to improve the performance. Striping is dependent on the stripe size and the operating system. In an ideal case, we would benchmark the application with different stripe sizes. The speed difference while striping depends on the parameters we have used, like stripe size. The difference in performance also depends on the number of disks. We have to choose if we want to optimize for random access or sequential access. To gain reliability, we may decide to set up with striping and mirroring (RAID 0+1). RAID stands for Redundant Array of Independent Drives. This approach needs 2 x N drives to hold N drives of data. With a good volume management software, we can manage this setup efficiently.
- There is another approach to it, as well. Depending on how critical the type of data is, we may vary the RAID level. For example, we can store really important data, such as host information and logs, on a RAID 0+1 or RAID N disk, whereas we can store semi-important data on a RAID 0 disk. In the case of RAID, parity bits are used to ensure the integrity of the data stored on each drive. So, RAID N becomes a problem if we have too many write operations to be performed. The time required to update the parity bits in this case is high.
- If it is not important to maintain when the file was last accessed, we can mount the file system with the -o noatime option. This option skips the updates on the file system, which reduces the disk seek time. We can also make the file system update asynchronously. Depending upon whether the file system supports it, we can set the -o async option.
Using Network File System (NFS) with MySQL
While using a Network File System (NFS), varying issues may occur, depending on the operating system and the NFS version. The following are the details:
- Data inconsistency is one issue with an NFS system. It may occur because of messages received out of order or lost network traffic. We can use TCP with hard and intr mount options to avoid these issues.
- MySQL data and log files may get locked and become unavailable for use if placed on NFS drives. If multiple instances of MySQL access the same data directory, it may result in locking issues. Improper shut down of MySQL or power outage are other reasons for filesystem locking issues. The latest version of NFS supports advisory and lease-based locking, which helps in addressing the locking issues. Still, it is not recommended to share a data directory among multiple MySQL instances.
- Maximum file size limitations must be understood to avoid any issues. With NFS 2, only the lower 2 GB of a file is accessible by clients. NFS 3 clients support larger files. The maximum file size depends on the local file system of the NFS server.
Optimizing the use of memory
In order to improve the performance of database operations, MySQL allocates buffers and caches memory. As a default, the MySQL server starts on a virtual machine (VM) with 512 MB of RAM. We can modify the default configuration for MySQL to run on limited memory systems.
The following list describes the ways to optimize MySQL memory:
- The memory area which holds cached InnoDB data for tables, indexes, and other auxiliary buffers is known as the InnoDB buffer pool. The buffer pool is divided into pages. The pages hold multiple rows. The buffer pool is implemented as a linked list of pages for efficient cache management. Rarely used data is removed from the cache using an algorithm. Buffer pool size is an important factor for system performance. The innodb__buffer_pool_size system variable defines the buffer pool size. InnoDB allocates the entire buffer pool size at server startup. 50 to 75 percent of system memory is recommended for the buffer pool size.
- With MyISAM, all threads share the key buffer. The key_buffer_size system variable defines the size of the key buffer. The index file is opened once for each MyISAM table opened by the server. For each concurrent thread that accesses the table, the data file is opened once. A table structure, column structures for each column, and a 3 x N sized buffer are allocated for each concurrent thread. The MyISAM storage engine maintains an extra row buffer for internal use.
- The optimizer estimates the reading of multiple rows by scanning. The storage engine interface enables the optimizer to provide information about the recorded buffer size. The size of the buffer can vary depending on the size of the estimate. In order to take advantage of row pre-fetching, InnoDB uses a variable size buffering capability. It reduces the overhead of latching and B-tree navigation.
- Memory mapping can be enabled for all MyISAM tables by setting the myisam_use_mmap system variable to 1.
- The size of an in-memory temporary table can be defined by the tmp_table_size system variable. The maximum size of the heap table can be defined using the max_heap_table_size system variable. If the in-memory table becomes too large, MySQL automatically converts the table from in-memory to on-disk. The storage engine for an on-disk temporary table is defined by the internal_tmp_disk_storage_engine system variable.
- MySQL comes with the MySQL performance schema. It is a feature to monitor MySQL execution at low levels. The performance schema dynamically allocates memory by scaling its memory use to the actual server load, instead of allocating memory upon server startup. The memory, once allocated, is not freed until the server is restarted.
- Thread specific space is required for each thread that the server uses to manage client connections. The stack size is governed by the thread_stack system variable. The connection buffer is governed by the net_buffer_length system variable. A result buffer is governed by net_buffer_length. The connection buffer and result buffer starts with net_buffer_length bytes, but enlarges up to max_allowed_packets bytes, as needed.
- All threads share the same base memory.
- All join clauses are executed in a single pass. Most of the joins can be executed without a temporary table. Temporary tables are memory-based hash tables. Temporary tables that contain BLOB data and tables with large row lengths are stored on disk.
- A read buffer is allocated for each request, which performs a sequential scan on a table. The size of the read buffer is determined by the read_buffer_size system variable.
- MySQL closes all tables that are not in use at once when FLUSH TABLES or mysqladmin flush-table commands are executed. It marks all in-use tables to be closed when the current thread execution finishes. This frees in-use memory. FLUSH TABLES returns only after all tables have been closed.
It is possible to monitor the MySQL performance schema and sys schema for memory usage. Before we can execute commands for this, we have to enable memory instruments on the MySQL performance schema. It can be done by updating the ENABLED column of the performance schema setup_instruments table. The following is the query to view available memory instruments in MySQL:
mysql> SELECT * FROM performance_schema.setup_instruments WHERE NAME LIKE '%memory%';
This query will return hundreds of memory instruments. We can narrow it down by specifying a code area. The following is an example to limit results to InnoDB memory instruments:
mysql> SELECT * FROM performance_schema.setup_instruments WHERE NAME LIKE '%memory/innodb%';
The following is the configuration to enable memory instruments:
The following is an example to query memory instrument data in the memory_summary_global_by_event_name table in the performance schema:
mysql> SELECT * FROM performance_schema.memory_summary_global_by_event_name WHERE EVENT_NAME LIKE 'memory/innodb/buf_buf_pool'G; EVENT_NAME: memory/innodb/buf_buf_pool COUNT_ALLOC: 1 COUNT_FREE: 0 SUM_NUMBER_OF_BYTES_ALLOC: 137428992 SUM_NUMBER_OF_BYTES_FREE: 0 LOW_COUNT_USED: 0 CURRENT_COUNT_USED: 1 HIGH_COUNT_USED: 1 LOW_NUMBER_OF_BYTES_USED: 0 CURRENT_NUMBER_OF_BYTES_USED: 137428992 HIGH_NUMBER_OF_BYTES_USED: 137428992
It summarizes data by EVENT_NAME.
The following is an example of querying the sys schema to aggregate currently allocated memory by code area:
mysql> SELECT SUBSTRING_INDEX(event_name,'/',2) AS code_area, sys.format_bytes(SUM(current_alloc)) AS current_alloc FROM sys.x$memory_global_by_current_bytes GROUP BY SUBSTRING_INDEX(event_name,'/',2) ORDER BY SUM(current_alloc) DESC;
We must consider the following factors when measuring performance:
- While measuring the speed of a single operation or a set of operations, it is important to simulate a scenario in the case of a heavy database workload for benchmarking
- In different environments, the test results may be different
- Depending on the workload, certain MySQL features may not help with performance
MySQL 8 supports measuring the performance of individual statements. If we want to measure the speed of any SQL expression or function, the BENCHMARK() function is used. The following is the syntax for the function:
The output of the BENCHMARK function is always zero. The speed can be measured by the line printed by MySQL in the output. The following is an example:
mysql> select benchmark(1000000, 1+1);
From the preceding example , we can find that the time taken to calculate 1+1 for 1000000 times is 0.15 seconds.
Other aspects involved in optimizing MySQL servers and clients include optimizing locking operations, examining thread information and more. To know about these techniques, you may check out the book MySQL 8 Administrator’s Guide.