Using prefix primary keys

In this example we will add indexes to two tables that are almost identical. The only difference will be the definition of their primary keys. You will see the difference in space consumption for secondary indexes between a regular full column primary key and a prefix primary key. The sample table structure and data are designed to demonstrate the effect very evidently. In real-world scenarios the effect will most certainly be less severe.

Getting ready...

Connect to the database server with your administrative account.

How to do it...

Download the sample script for this article from the book's website and save it to your local disk. In the example below, we will assume it is stored in /tmp/idxsizeestimate_sample.sql.

Create a new database and make it the default database:
CREATE DATABASE pktests; USE pktests;

Import the sample data from the downloaded file. When done, you will be presented with some statistics about the two tables loaded. Note that both tables have an Index Length of 0.
SOURCE /tmp/idxsizeestimate_sample.sql;

Now with the sample tables present, add an index to each of them:
ALTER TABLE LongCharKey ADD INDEX IDX_PAY_10(Payload(10));

ALTER TABLE LongCharKey10 ADD INDEX IDX_PAY_10(Payload(10));

Display the data and index sizes of the tables now:
SHOW TABLE STATUS LIKE 'LongCharKey%';

Add another index to each table to make the difference even more evident:
ALTER TABLE LongCharKey ADD INDEX IDX2_PAY_10(Payload(10));
ALTER TABLE LongCharKey10 ADD INDEX IDX2_PAY_10(Payload(10));

Display the data and index sizes of the tables again and compare with the previous values:
SHOW TABLE STATUS LIKE 'LongCharKey%';

Name

Rows

Data Length

Index Length

Index/Data Ratio

LongCharKey

50045

30392320

28868608

94.99%

LongCharKey10

50045

29949952

3178496

10.61%

With the second index added, the difference in index length becomes even clearer.

How it works...

Executing the downloaded script will set up two tables with the following structures:

CREATE TABLE `LongCharKey` (
`LongChar` char(255) NOT NULL,
`Payload` char(255) DEFAULT NULL,
PRIMARY KEY (`LongChar`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `LongCharKey10` (
`LongChar` char(255) NOT NULL,
`Payload` char(255) DEFAULT NULL,
PRIMARY KEY (`LongChar`(10))
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

The two tables are almost identical, except for the primary key definition. They are pre-filled with 50,000 records of sample data.

The tables are populated with exactly the same 50,000 records of pseudo-random data. The Payload column is filled with sequences of 255 random letters each. The LongChar column is filled with a sequential number in the first 10 characters and then filled up to use all remaining 245 character with the same sort of random data.
SELECT LEFT(LongChar,20), LEFT(Payload, 20) from LongCharKey LIMIT 5;

LEFT(LongChar,20)

LEFT(Payload, 20)

0000000000KEAFAYVEJD

RHSKMEJITOVBPOVAGOGM

0000000001WSSGKGMIJR

VARLGOYEONSLEJVTVYRP

0000000002RMNCFBJSTL

OVWGTTSHEQHJHTHMFEXV

0000000003SAQVOQSINQ

AHDYUXTAEWRSHCLJYSMW

0000000004ALHYUDSRBH

DPLPXJVERYHUOYGGUFOS

While the LongKeyChar table simply marks the whole LongChar column as a primary key with its entire 255 characters length, the LongCharKey10 table limits the primary key to the first 10 characters of that column. This is perfectly fine for this table, because the test data was crafted to be unique in this range.

Neither one of the two tables has any secondary indexes defined. Looking at some relevant table data shows they are equally big (some columns left out for brevity):
SHOW TABLE STATUS LIKE 'LongCharKey%';

Name

Rows

Data Length

Index Length

LongCharKey

50045

30392320

LongCharKey10

50045

29949952

With each index added, the Index Length for the first table will increase significantly, while for the second one its growth is much slower.

In case of the LongCharKey table, each secondary index record will carry around with it a complete copy of the LongChar column because it is the primary key without limitation. Assuming a single byte character encoding, this means every secondary index record is blown up in size by 255 bytes on top of the 10 bytes needed for the actual index entry. This means a whole kilobyte is spent just for the primary key reference for every 4 records!

In contrast to that, the primary key definition of the LongCharKey10 table only includes the leading 10 characters of the LongChar column, making the secondary index entry 245 bytes shorter and thereby explaining the much slower growth upon adding further indexes.

Choosing InnoDB primary key columns

In the article introduction we promised to shed some light on how to choose your InnoDB primary key columns sensibly. Be advised that choosing good primary key columns is not an exact science—there are multiple aspects that influence this decision. Depending on your needs and preconditions you will want to prioritize them differently from one table to the next. Consider the following as general advice rather than hard rules that must be obeyed unconditionally.

Getting ready

In order to make reasonable decisions on primary key columns, it is important to have a very clear understanding of what the data looks like in the table at hand. If you already have existing data that is to be stored in an InnoDB table—for example in MyISAM format—it can be helpful to compare it with the criteria below.

If you are planning a new schema, you might have to guess about some characteristics of the future data. As is often the case, the quality of your choices is directly proportional to how good those guesses are.

This recipe is less strict step-by-step instructions that must be followed from top to bottom and should be considered a list of properties a good primary key should have, even though you might decide some of them do not apply to your actual environment. As a rule of thumb, however, a column that fulfills all or most of the attributes described below is most probably a sensible choice for a primary key. See the How it works... section for details on the individual items.

How to do it...

Identify unique attributes: This is an absolute (technical) requirement for primary keys in general. Any data attribute that is not strictly guaranteed to be free of duplicates cannot be used alone as a primary key.

Identify immutable attributes: While not absolutely necessary, a good primary key is never changed once it has been assigned. For all intents and purposes, you should avoid columns that have even a small chance of being changed for existing records

Use reasonably short keys: This is the "softest" criterion of all. In general, longer keys have negative impacts on overall database performance—the longer the worse. Also, consider a prefix primary key. See Using prefix primary keys earlier in this article for more information.

Prefer single-column keys: Even though nothing prevents you from choosing a composite primary key (a combination of columns that together form the uniqueness), this can easily become a hassle to work with, especially when handling foreign keys.

Consider the clustered index nature of the primary key: As InnoDB's primary key is also clustered, you should take this special nature into account as well. It can speed up read access a lot, if you often have to query for key ranges, because disk seek times will be minimized.