Indexing is a means of increasing the performance of your database queries. It’s amazing how many people are unaware of the basic principals of good indexing.
What an index does is provide a shortcut for the database, when it’s trying to find records matching some search criteria. A simple search requirement is to return values from records, where one or more fields match a particular value.
SELECT bar FROM foo WHERE item = ‘TEST’
If an index exists for the table foo with item as the indexed field, then the database optimizer will recognise this when the SQL statement is parsed, and use this index when executing the query. This results in an overall performance gain. How significant this performance gain is depends on how many records are present in the table, what the cardinality of the field is and how much variation is present in the field’s values.
If a field has a unique value per record, then the cardinality is high for that field. If the field has only one or two possible values, then the cardinality is low. Most types of index perform better with a high cardinality.
So what happens when the index is not so good? Let’s look at a slightly different query and see how the same index behaves.
SELECT bar FROM foo WHERE item = ‘TEST’ AND value = ‘SNAFU’
In this case only one of the fields used in the WHERE clause is present in the index. That means that further examination of all the records returned from the index will need to be done, to isolate those records that meet the second criteria, which isn’t supported by the index. This is referred to as an index range scan. This situation is not as bad as not having any index but not as good as having a completely appropriate index. With a little work we can implement a simple optimization of the index that will yield a solution which is suitable for both queries.
If the index is recreated indexing both the item field and value field (in that order), then the index will be applicable for both queries. This is a property of the structure of table indexes. Interestingly, if the table is indexed by the same fields in the opposite order, it’s only useful for the second query. Even then it may not be as efficient.
A second simple optimisation is to add the bar field to the index. While the bar field is not used in the search criteria, if all of the fields required to be returned exist within the field structure, no secondary search for the row data is required in order to meet the requirements of the query.
This fact gives us a little insight into how an index works. An index is essentially a sorted list of pointers to groups of rows that match the different values within the fields that are indexed. When each group is made up of only a single row we say this is a unique index. Unique indexes are often the most efficient form of index.
Critical to the performance of an index is the sorting of the index. Just as with the telephone book it’s quicker to look up records when the field you are searching by is sorted. Of course, if what you are searching for is not indexed, then the index is useless for that query.
This is a very simplistic overview of how an index behaves. As you can see, for some operations a second seek for the individual rows that match the search criteria is required. Each of these seek operations against unsorted data is relatively costly.
In later posts I hope to go into greater detail of how a typical database index behaves.