Database Indexing 101 Part 1

Indexing is a means of increasing the performance of your database queries. It’s amazing how many people are unaware of the basic principals of good indexing.

What an index does is provide a shortcut for the database, when it’s trying to find records matching some search criteria. A simple search requirement is to return values from records, where one or more fields match a particular value.

SELECT bar FROM foo WHERE item = ‘TEST’

If an index exists for the table foo with item as the indexed field, then the database optimizer will recognise this when the SQL statement is parsed, and use this index when executing the query. This results in an overall performance gain. How significant this performance gain is depends on how many records are present in the table, what the cardinality of the field is and how much variation is present in the field’s values.

If a field has a unique value per record, then the cardinality is high for that field. If the field has only one or two possible values, then the cardinality is low. Most types of index perform better with a high cardinality.

So what happens when the index is not so good? Let’s look at a slightly different query and see how the same index behaves.

SELECT bar FROM foo WHERE item = ‘TEST’ AND value = ‘SNAFU’

In this case only one of the fields used in the WHERE clause is present in the index. That means that further examination of all the records returned from the index will need to be done, to isolate those records that meet the second criteria, which isn’t supported by the index. This is referred to as an index range scan. This situation is not as bad as not having any index but not as good as having a completely appropriate index. With a little work we can implement a simple optimization of the index that will yield a solution which is suitable for both queries.

If the index is recreated indexing both the item field and value field (in that order), then the index will be applicable for both queries. This is a property of the structure of table indexes. Interestingly, if the table is indexed by the same fields in the opposite order, it’s only useful for the second query. Even then it may not be as efficient.
A second simple optimisation is to add the bar field to the index. While the bar field is not used in the search criteria, if all of the fields required to be returned exist within the field structure, no secondary search for the row data is required in order to meet the requirements of the query.

This fact gives us a little insight into how an index works. An index is essentially a sorted list of pointers to groups of rows that match the different values within the fields that are indexed. When each group is made up of only a single row we say this is a unique index. Unique indexes are often the most efficient form of index.

Simple Index

Critical to the performance of an index is the sorting of the index. Just as with the telephone book it’s quicker to look up records when the field you are searching by is sorted. Of course, if what you are searching for is not indexed, then the index is useless for that query.

This is a very simplistic overview of how an index behaves. As you can see, for some operations a second seek for the individual rows that match the search criteria is required. Each of these seek operations against unsorted data is relatively costly.
In later posts I hope to go into greater detail of how a typical database index behaves.

  1. #1 by Mallesham Karnati on April 24, 2007 - 4:00 pm

    Very good and succinct introduction to Indexing. Looking forward to reading of the next part.

  2. #2 by Brendan Lally on April 27, 2007 - 9:32 pm

    Newbies to SQL/database’s often think that adding an index is the magic answer to speeding things up.

    Often it (appears to) work.
    There is no guarantee that it will continue 2 do so.

    U really need to analyse the query (SQL) to see that it is going 2b consistent and get ‘coverage’ and hence the index will kick-in.
    Several of the Sql dialects allow u 2 fudge it and ‘make’ the index “stick” in the query. This is (on a long-term basis) a bad idea and is similiar to the “just stick an index on and u’re problem is solved”

    Any index also comes with downsides that u need 2b aware of:
    – aging
    – rebuild
    – statistics
    – de-performance on insert/updates/deletes

    Like a lot of things in life – its not as it first seems.

    Lal

  3. #3 by anthony geoghegan on April 30, 2007 - 9:19 am

    I couldn’t agree more Brendan. The scope of this article is to introduce some basic understanding in the hope that it will yield some better implementations. I had intended to address the more advanced areas you have mentioned in further articles. We at DSI always advocate a best practice approach where budget and time constraints allow.

    I’m familiar with the just “stick an index on it” brigade as well and I’m trying to show that a small amount of understanding can go a long way to improving that.

    And I happen to believe that adding an index, the right index, can be the magic answer to speeding things up 🙂

    Ant.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: