In this example we will add indexes to two tables that are almost identical. The only difference will be the definition of their primary keys. You will see the difference in space consumption for secondary indexes between a regular full column primary key and a prefix primary key. The sample table structure and data are designed to demonstrate the effect very evidently. In real-world scenarios the effect will most certainly be less severe.
Connect to the database server with your administrative account.
ALTER TABLE LongCharKey10 ADD INDEX IDX_PAY_10(Payload(10));
Name | Rows | Data Length | Index Length | Index/Data Ratio |
LongCharKey | 50045 | 30392320 | 28868608 | 94.99% |
LongCharKey10 | 50045 | 29949952 | 3178496 | 10.61% |
With the second index added, the difference in index length becomes even clearer.
Executing the downloaded script will set up two tables with the following structures:
CREATE TABLE `LongCharKey` (
`LongChar` char(255) NOT NULL,
`Payload` char(255) DEFAULT NULL,
PRIMARY KEY (`LongChar`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `LongCharKey10` (
`LongChar` char(255) NOT NULL,
`Payload` char(255) DEFAULT NULL,
PRIMARY KEY (`LongChar`(10))
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
The two tables are almost identical, except for the primary key definition. They are pre-filled with 50,000 records of sample data.
The tables are populated with exactly the same 50,000 records of pseudo-random data. The Payload column is filled with sequences of 255 random letters each. The LongChar column is filled with a sequential number in the first 10 characters and then filled up to use all remaining 245 character with the same sort of random data.
SELECT LEFT(LongChar,20), LEFT(Payload, 20) from LongCharKey LIMIT 5;
LEFT(LongChar,20) | LEFT(Payload, 20) |
0000000000KEAFAYVEJD | RHSKMEJITOVBPOVAGOGM |
0000000001WSSGKGMIJR | VARLGOYEONSLEJVTVYRP |
0000000002RMNCFBJSTL | OVWGTTSHEQHJHTHMFEXV |
0000000003SAQVOQSINQ | AHDYUXTAEWRSHCLJYSMW |
0000000004ALHYUDSRBH | DPLPXJVERYHUOYGGUFOS |
While the LongKeyChar table simply marks the whole LongChar column as a primary key with its entire 255 characters length, the LongCharKey10 table limits the primary key to the first 10 characters of that column. This is perfectly fine for this table, because the test data was crafted to be unique in this range.
Neither one of the two tables has any secondary indexes defined. Looking at some relevant table data shows they are equally big (some columns left out for brevity):
SHOW TABLE STATUS LIKE ‘LongCharKey%’;
Name | Rows | Data Length | Index Length |
LongCharKey | 50045 | 30392320 | 0 |
LongCharKey10 | 50045 | 29949952 | 0 |
With each index added, the Index Length for the first table will increase significantly, while for the second one its growth is much slower.
In case of the LongCharKey table, each secondary index record will carry around with it a complete copy of the LongChar column because it is the primary key without limitation. Assuming a single byte character encoding, this means every secondary index record is blown up in size by 255 bytes on top of the 10 bytes needed for the actual index entry. This means a whole kilobyte is spent just for the primary key reference for every 4 records!
In contrast to that, the primary key definition of the LongCharKey10 table only includes the leading 10 characters of the LongChar column, making the secondary index entry 245 bytes shorter and thereby explaining the much slower growth upon adding further indexes.
In the article introduction we promised to shed some light on how to choose your InnoDB primary key columns sensibly. Be advised that choosing good primary key columns is not an exact science—there are multiple aspects that influence this decision. Depending on your needs and preconditions you will want to prioritize them differently from one table to the next. Consider the following as general advice rather than hard rules that must be obeyed unconditionally.
In order to make reasonable decisions on primary key columns, it is important to have a very clear understanding of what the data looks like in the table at hand. If you already have existing data that is to be stored in an InnoDB table—for example in MyISAM format—it can be helpful to compare it with the criteria below.
If you are planning a new schema, you might have to guess about some characteristics of the future data. As is often the case, the quality of your choices is directly proportional to how good those guesses are.
This recipe is less strict step-by-step instructions that must be followed from top to bottom and should be considered a list of properties a good primary key should have, even though you might decide some of them do not apply to your actual environment. As a rule of thumb, however, a column that fulfills all or most of the attributes described below is most probably a sensible choice for a primary key. See the How it works… section for details on the individual items.
I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…
Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…
Once we learn how to deploy an Ubuntu server, how to manage users, and how…
Key-takeaways: Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…
While developing a web application, or setting dynamic pages and meta tags we need to deal with…
Software architecture is one of the most discussed topics in the software industry today, and…