(For more resources related to this topic, see here.)
As a database administrator (DBA) or developer, one of your most important goals is to ensure that the query times are consistent with the service-level agreement (SLA) and are meeting user expectations. Along with other performance enhancement techniques, creating indexes for your queries on underlying tables is one of the most effective and common ways to achieve this objective.
The indexes of underlying relational tables are very similar in purpose to an index section at the back of a book. For example, instead of flipping through each page of the book, you use the index section at the back of the book to quickly find the particular information or topic within the book. In the same way, instead of scanning each individual row on the data page, SQL Server uses indexes to quickly find the data for the qualifying query. Therefore, by indexing an underlying relational table, you can significantly enhance the performance of your database.
Indexing affects the processing speed for both OLTP and OLAP and helps you achieve optimum query performance and response time.
The cost associated with indexes
SQL Server uses indexes to optimize overall query performance. However, there is also a cost associated with indexes; that is, indexes slow down insert, update, and delete operations. Therefore, it is important to consider the cost and benefits associated with indexes when you plan your indexing strategy.
How SQL Server uses indexes
A table that doesn’t have a clustered index is stored in a set of data pages called a heap. Initially, the data in the heaps is stored in the order in which the rows are inserted into the table. However, SQL Server Database Engine moves the data around the heap to store the rows efficiently. Therefore, you cannot predict the order of the rows for heaps because data pages are not sequenced in any particular order. The only way to guarantee the order of the rows from a heap is to use the SELECT statement with the ORDER BY clause.
Access without an index
When you access the data, SQL Server first determines whether there is a suitable index available for the submitted SELECT statement. If no suitable index is found for the submitted SELECT statement, SQL Server retrieves the data by scanning the entire table. The database engine begins scanning at the physical beginning of the table and scans through the full table page by page and row by row to look for qualifying data that is specified in the submitted SELECT statement. Then, it extracts and returns the rows that meet the criteria in the format specified in the submitted SELECT statement.
Access with an index
The process is improved when indexes are present. If the appropriate index is available, SQL Server uses it to locate the data. An index improves the search process by sorting data on key columns. The database engine begins scanning from the first page of the index and only scans those pages that potentially contain qualifying data based on the index structure and key columns. Finally, it retrieves the data rows or pointers that contain the locations of the data rows to allow direct row retrieval.
The structure of indexes
In SQL Server, all indexes—except full-text, XML, in-memory optimized, and columnstore indexes—are organized as a balanced tree (B-tree). This is because full-text indexes use their own engine to manage and query full-text catalogs, XML indexes are stored as internal SQL Server tables, in-memory optimized indexes use the Bw-tree structure, and columnstore indexes utilize SQL Server in-memory technology.
In the B-tree structure, each page is called a node. The top page of the B-tree structure is called the root node. Non-leaf nodes, also referred to as intermediate levels, are hierarchical tree nodes that comprise the index sort order. Non-leaf nodes point to other non-leaf nodes that are one step below in the B-tree hierarchy, until reaching the leaf nodes. Leaf nodes are at the bottom of the B-tree hierarchy. The following diagram illustrates the typical B-tree structure:
In SQL Server 2014, you can create several types of indexes. They are explored in the next sections.
A clustered index sorts table or view rows in the order based on clustered index key column values. In short, a leaf node of a clustered index contains data pages, and scanning them will return the actual data rows. Therefore, a table can have only one clustered index. Unless explicitly specified as nonclustered, SQL Server automatically creates the clustered index when you define a PRIMARY KEY constraint on a table.
When should you have a clustered index on a table?
Although it is not mandatory to have a clustered index per table, according to the TechNet article, Clustered Index Design Guidelines, with few exceptions, every table should have a clustered index defined on the column or columns that used as follows:
- The table is large and does not have a nonclustered index. The presence of a clustered index improves performance because without it, all rows of the table will have to be read if any row needs to be found.
- A column or columns are frequently queried, and data is returned in a sorted order. The presence of a clustered index on the sorting column or columns prevents the sorting operation from being started and returns the data in the sorted order.
- A column or columns are frequently queried, and data is grouped together. As data must be sorted before it is grouped, the presence of a clustered index on the sorting column or columns prevents the sorting operation from being started.
- A column or columns data that are frequently used in queries to search data ranges from the table. The presence of clustered indexes on the range column will help avoid the sorting of the entire table data.
Nonclustered indexes do not sort or store the data of the underlying table. This is because the leaf nodes of the nonclustered indexes are index pages that contain pointers to data rows. SQL Server automatically creates nonclustered indexes when you define a UNIQUE KEY constraint on a table. A table can have up to 999 nonclustered indexes.
You can use the CREATE INDEX statement to create clustered and nonclustered indexes. A detailed discussion on the CREATE INDEX statement and its parameters is beyond the scope of this article. For help with this, refer to the CREATE INDEX (Transact-SQL) article at http://msdn.microsoft.com/en-us/library/ms188783.aspx.
SQL Server 2014 also supports new inline index creation syntax for standard, disk-based database tables, temp tables, and table variables. For more information, refer to the CREATE TABLE (SQL Server) article at http://msdn.microsoft.com/en-us/library/ms174979.aspx.
As the name implies, single-column indexes are based on a single-key column. You can define it as either clustered or nonclustered. You cannot drop the index key column or change the data type of the underlying table column without dropping the index first. Single-column indexes are useful for queries that search data based on a single column value.
Composite indexes include two or more columns from the same table. You can define composite indexes as either clustered or nonclustered. You can use composite indexes when you have two or more columns that need to be searched together. You typically place the most unique key (the key with the highest degree of selectivity) first in the key list.
For example, examine the following query that returns a list of account numbers and names from the Purchasing.Vendor table, where the name and account number starts with the character A:
USE [AdventureWorks2012]; SELECT [AccountNumber] , [Name] FROM [Purchasing].[Vendor] WHERE [AccountNumber] LIKE 'A%' AND [Name] LIKE 'A%'; GO
If you look at the execution plan of this query without modifying the existing indexes of the table, you will notice that the SQL Server query optimizer uses the table’s clustered index to retrieve the query result, as shown in the following screenshot:
As our search is based on the Name and AccountNumber columns, the presence of the following composite index will improve the query execution time significantly:
USE [AdventureWorks2012]; GO CREATE NONCLUSTERED INDEX [AK_Vendor _ AccountNumber_Name] ON [Purchasing].[Vendor] ([AccountNumber] ASC, [Name] ASC) ON [PRIMARY]; GO
Now, examine the query execution plan of this query once again, after creating the previous composite index on the Purchasing.Vendor table, as shown in the following screenshot:
As you can see, SQL Server performs a seek operation on this composite index to retrieve the qualifying data.
Thus we have learned what indexes are, how SQL Server uses indexes, structure of indexes, and some of the types of indexes.
Resources for Article:
- Easily Writing SQL Queries with Spring Python [article]
- Manage SQL Azure Databases with the Web Interface ‘Houston’ [article]
- VB.NET Application with SQL Anywhere 10 database: Part 1 [article]