SQL Server Optimization
Summary: This article describes different ways of improving the performance of SQL Server queries, with a focus on index optimization.
Contents
Introduction
The purpose of this document is to describe different ways of improving the performance of SQL Server queries. Most of this document will describe index optimization, with occasional references to particular code snippets. In other words, this document will describe how to achieve the best performance, given the tables and queries to run against.
Database design issues and entity-relationship modeling techniques are out of scope of this document, even though flawed design can severely impact the performance in many ways.
Data and Index Organization
Index and data organization will be discussed together for two reasons. First, there is rarely a table without an index. Even a small lookup table has a primary key that allows other table to refer to it, and a primary key is always based on the index. Second, data organization depends on the indexes that exist for the table; a table with a clustered index is organized differently than a table without one.
Non-Clustered Indexes/Heap-Based Tables

A heap table, by definition, is a table that doesn't have any clustered indexes. Different pages of the heap-based table occupy different non-contiguous areas on a disk, and they are not linked together in any way. Each non-clustered index defined on the table will have a corresponding entry in a sysindexes table with an indid between 2 and 254, pointing to the first IAM (Index Allocation Map) page. IAM pages represent the linked list of all pages used by the database object (table or index) and used by SQL Server for allocating and deallocating storage space. It is important to remember that IAM pages are not used to search through data or index pages, but only to allocate and deallocate them.
An observant reader might ask at this point: What if there is no index defined for a table at all? Well, the address of the first IAM page of the heap table itself is stored in the sysindexes table with indid = 0 (see Figure 2). In that respect, the abbreviation IAM is a little misleading; it would be better called SAM (Storage Allocation Map or Space Allocation Map).

All the indexes (clustered and non-clustered) are B-Trees, where B stands for "balanced" (not "binary," as is sometimes thought), which basically means that it will take about the same number of jumps to get to the leaf node for any value of the key. The number of jumps required to get to the leaf node has a direct impact on the total time required to read the data using that index, so let's explore this a little bit more.
All pages, including non-leaf index pages, have a size of 8192 bytes, including 96 bytes occupied by the header—this leaves 8096 bytes for the index data. Let's calculate how many jumps will be required for the index built on the single field of the integer type. Each pointer to the index page of the next level or to the data page in the heap takes 6 bytes, so the total size of one index entry is 10 bytes. This means that each index page can contain up to 800 index entries. Let's assume that the average number of entries on each index page is 400. This means that for a table holding up to 160,000 rows of data, it will take only three jumps to get to the data page. The first jump is to the root index page, the second jump is to the leaf-node-level index page, and the third jump is to the data page itself. For a table holding 160,000 to 64,000,000 rows of data, it will take four jumps.
This calculation demonstrates that the width of the key directly affects the number of required jumps. If we have the key defined on the char (200) column, it will take only three jumps for a table holding 400 rows or less; four jumps for a table holding up to 8,000 rows; and five jumps for a table holding up to 160,000 rows. This leads to the conclusion that indexes built on wider keys perform searches slower (in addition to taking more space).
Covering Indexes
The term covering index does not mean a separate kind of index having a different internal structure. Rather, this term is used to describe a certain technique that is used to improve performance. It is discussed here because, as will be shown later, for most practical purposes, a clustering index can be thought of as a covering index.
Let's consider a simple but typical example. Suppose that we need to join two tables—Master, defined as (MasterId int primary key, Data int), and Detail, defined as (DetailId int primary key, Data int, MasterId int)—and select columns (MasterID, Detail.Data) where Master.Data satisfies a certain condition. The condition can be anything that is built directly on the value of Master.Data—for example, Data = @data, Data between @start and @end, or Data in (1, 2, 3). Let's also assume that, besides primary key indexes, the Master table has an index idxMaster built on (Data, MasterID), and the Detail table has an index idxDetail built on (MasterID). The query plan will look as shown in Figure 3.

What does that mean, exactly? Let's analyze it step by step: 
- The query engine searches idxMaster,      to find the entries that satisfy our condition for Master.Data, and      it builds a list of corresponding Master.MasterID values. This is      fast, because: 
- Index pages are sorted by Master.Data,       so the engine can jump directly to leaf index pages that satisfy our       condition. 
- Those leaf index pages already       contain values of Master.MasterId, so there is no need to jump       somewhere else to read them. 
- The engine selects entries from      idxDetail having values of Detail.MasterID that match the      list of Master.MasterID built during Step 1. This can be done by      simply looping through the list of Master.MasterID and looking up      each corresponding entry in idxDetail, or by using more complicated      hash or merge algorithms. The query engine determines      which algorithm to use in each particular case, based on the relative cost      of each algorithm. This, in turn, depends on the following criteria: 
- Selectivity of the index 
- Estimated number of lookups 
- Estimated number of rows to       find. 
For the low relative values of the second and third criteria, the loop algorithm is preferred, whereas for higher values, the hash/merge algorithms become more efficient. The output of this step is a list of (MasterID, Detail:FID:PN:RN) values (see Figure 1). 
- A list of heap pointers (FID:PN:RN),      also known as bookmarks, is used to read randomly located heap data      pages of the Detail table, and to retrieve values of the Data      field in order to compile the requested result set (MasterID,      Detail.Data). This step is called bookmark lookup, and, in our      case, it is the most expensive one. The cost of bookmark lookup can be      anywhere from 90 to 99 percent of the total query cost. This means that      eliminating this step can improve the performance ten-fold or even      one-hundred-fold. The cost of bookmark lookup depends on the fragmentation      of data pages, the number of data pages to read, that availability of      read-ahead optimization, the IO-to-CPU cost ratio, and other factors. 
Please note that, in this example, bookmark lookup has only been performed on the Detail table, not on the Master table. This is because the required output field MasterID was a part of the index. This leads us to the definition of covering index: A covering index is the index that contains all output fields required by the operation performed on that index. Please note that if the Master table contained more fields, and at least one of these fields was present in the SELECT clause, the query engine would have to perform bookmark lookup on the Master table, and therefore idxMaster could not serve as a covering index anymore. That is why it is not possible to tell whether the index is covering without taking into consideration the query it is used in, and even the particular execution plan.
It is very easy now to devise a way to improve the performance of our query. Adding Detail.Data as a second field to idxDetail eliminates the need for bookmark lookup and considerably improves the performance.
Good candidates for covering indexes are junction tables (tables that exist to resolve many-to-many relationships). In the simplest case of a many-to-many relationship between T1 and T2, it is enough to have two indexes, (T1_Id, T2_Id) and (T2_Id, T1_Id), in order to save the query engine from ever doing bookmark lookups on this table.
It is rarely feasible to use covering indexes for the fact tables, since there are usually many fields requested from these. Putting all required fields into the index, in order to make it covering, would in a way make it a very ineffective clustered index, which brings us to the next topic.
Clustered Indexes/Index-Based Tables
As you can see, this diagram is very similar to the diagram of a non-clustered index (Figure 1), except that leaf nodes contain data, instead of pointers to data. Therefore, the easiest way to understand a clustered index is to treat it as a covering index consisting of all the fields in the table, with three additional advantages: 
- Data stored in leaf nodes is      not duplicated. On the other hand, in the case of a covering index on a      heap table, data will be stored both in the index and in the heap. 
- Table scan (called clustered      index scan in the case of an index-based table) is faster than table      scan of a heap table, because leaf nodes of the clustered index are stored      together. The internal implementation of modern hard drives allows storing      all the data read from one logical cylinder of the hard drive in its      internal cache; in the case of an index-based table, it is more likely      that this data will contain more data pages of the table being scanned. 
- If hash or merge algorithms are      selected by the query engine for the join, data retrieved from the table      should be sorted by the join key. If a clustered index is built on the      join key, retrieved data is already sorted. 
Several special considerations have to be kept in mind about index-based tables. The most important one is that all other (non-clustered) indexes of the index table contain values of the clustering key, instead of heap pointers. In the available sources, it is never mentioned what exactly was the reason for this design decision, so we can only guess that the additional cost of searching through the clustering index is outweighed by the cost saved during the reallocation of the clustered index pages. Indeed, nothing would prevent non-clustered indexes of the index-based table from pointing directly to the row in the leaf node that contains the data, but this pointer should be updated each time this row is moved to another location. Also, obviously, only one clustered index can exist for the table. So out of all possible indexes that could be clustered, it is necessary to select one that will give maximum performance gain.
Good candidates for clustered indexes are: 
- Primary keys of the      lookup/reference/dimension/master tables 
- Foreign keys of the fact/detail      tables 
- Datetime fields of the tables queried      by the date range 
Optimization Rules of Thumb
- Always look at the query plan      first. It will show you the optimal current execution plan from the query      engine's point of view. Find the most expensive part of the execution plan      and start optimizing from there. However, even before that, make sure that      the statistics on all tables in your query are up to date, by running the update      statistics <TableName> command on all tables in your query. 
- If you see table scan,      optimize. Table scan is the slowest possible way of execution. Table scan      means not only that no index is used, but that there is no clustered index      for this table at all. Even if you can only replace table scan with      clustered index scan, it is still worth it. 
- If you see clustered index scan,      find out whether it can be replaced with index seek. For that, find what      conditions are applied to this table. Usually, conditions exist for two or      three fields of the table. Find out the most selective condition (that is,      the condition that would produce the smallest number of records if applied      alone), and see whether an index on this field exists. Any index that      lists this field first will qualify. If there is no such index, create it      and see whether the query engine picks it up. 
- If the query engine is not      picking up the existing index (that is, if it is still doing a clustered      index scan), check the output list. It is possible that seek on your index      is faster than clustered index scan, but involves bookmark lookup that      makes the combined cost greater than use of a clustered index. Clustered      index operations (scan or seek) never need bookmark lookup, since a      clustered index already contains all the data. If the output list is not      big, add those fields to the index, and see whether the query engine picks      it up. Please remember that the combined size is more important than the      number of fields. Adding three integer fields to the index is less      expensive than adding one varchar field with an average data length of 20.      
Summarizing this rule, try to make your index covering, and see whether it works better than clustered index scan. Please note that it is not always possible to make the query engine pick up your index automatically. A small table or a low-selectivity index will produce clustered index scan, even if your index is covering. 
- If you see bookmark lookup, it      means that your index is not covering. Try to make it covering if it makes      sense (see the preceding guidelines). 
- The execution plan selected by      the query engine may be not the best one. The query engine makes certain      assumptions about disk subsystem and CPU cost versus IO cost. These      assumptions sometimes can be incorrect. If you don't believe that the      query engine's selection is the best one, run a query in the loop for 10      to 15 minutes with automatic selection, change the query to use your index      (you will have to use index hint to force it), and then run it for 10 to      15 minutes again. Compare the results to see which one works better. 
- Avoid any operations on the      fields, where possible. Some operations will prevent the use of the index      on this field even if it exists—for example, the infamous ltrim(rtrim(FieldName));      other operations will degrade the performance. For example, instead of using      the condition cast(DateField as varchar(20)) = @dateString, try to      convert @dateString to an expression of datetime type first, and      then compare it to DateField. 
- Please note that the query      engine cost estimate does not include the cost of embedded procedure or      function calls. If you compare between plain join and select from      table-value functions, the latter would seem to have smaller cost,      but it usually does not. In such a situation, use your own metrics to find      out which query performs better. 
- When it is not possible to      avoid operation on the field, use an index built on that expression. This      can be done in two ways: 
- Create a calculated field       based on your expression. 
- Create a view, and build an       index on it. 
Note    SQL Server requires certain conditions to be met in order to allow the use of calculated fields and indexed views (set quoted_identifier on, set arithabort on, and so on).
- Indexed views are a good way to      further speed up the query if you are not satisfied with the results.      Indexed view is a clustered index built over the view's select list. You      can also define additional indexes for the indexed view, just as you can      for any regular table. Indexed views take disk space and involve some      maintenance overhead (every time underlying tables change, the indexed      view also has to change), but they usually provide a good boost in      performance, even after all other optimization techniques are exhausted. 
Twelve Tips For Optimizing Sql Server 2005 Query Performance
1. Turn on the execution plan, and statistics
2. Use Clustered Indexes
3. Use Indexed Views
4. Use Covering Indexes
5. Keep your clustered index small.
6. Avoid cursors
7. Archive old data
8. Partition your data correctly
9. Remove user-defined inline scalar functions
10. Use APPLY
11. Use computed columns
12. Use the correct transaction isolation level
What is a query plan and what is the value from a performance tuning perspective?
A query plan is the physical break down of the code being passed to the SQL Server optimizer.
The value from a performance tuning perspective is that each component of the query can be understood and the percentage of resource utilization can be determined at a micro level. As query tuning is being conducted, the detailed metrics can be reviewed to compare the individual coding techniques to determine the best alternative.
1. Turn on the execution plan, and statistics
2. Use Clustered Indexes
3. Use Indexed Views
4. Use Covering Indexes
5. Keep your clustered index small.
6. Avoid cursors
7. Archive old data
8. Partition your data correctly
9. Remove user-defined inline scalar functions
10. Use APPLY
11. Use computed columns
12. Use the correct transaction isolation level
What is a query plan and what is the value from a performance tuning perspective?
A query plan is the physical break down of the code being passed to the SQL Server optimizer.
The value from a performance tuning perspective is that each component of the query can be understood and the percentage of resource utilization can be determined at a micro level. As query tuning is being conducted, the detailed metrics can be reviewed to compare the individual coding techniques to determine the best alternative.
 
