Indexing Nulls

Most database indexes types do not index NULL values. What this means is that anywhere a query’s WHERE clause contains “IS NULL” a full table scan will ensue (or at best an index range scan).

This stems from the fact that a NULL assignment in database terms actually means that no value has been assigned.

As I’ve mentioned before cardinality plays a large part in the efficiency of an index. I also think database implementers have considered indexing NULL values will lead to poorly performing indexes, because of the effect of on the field’s cardinality.
I have outlined below some possible solutions to performance issues associated with searching for NULL values.

My first suggestion is to consider querying on a different field altogether. Recently, I had to modify a query searching for null values in a foreign key field, i.e. no foreign key reference. When I analysed the table’s usage, I discovered that a mutually exclusive relationship existed with another foreign key. In other words, I could check for null values in field A by making sure there were values assigned to field B. Thus, for a query, I replace “WHERE FKeyA IS NULL” with “WHERE FKeyB IS NOT NULL”, which is easily indexed.
A sample table showing mutually exclusive field data

Solution number two is not to use a NULL value at all but to assign some standard code value instead. Consider it similar to putting “N\A” instead of NULL in a field. Crucially, this will affect the cardinality of any indexes indexing this field and hence affect their performance. If your table has fields to index where the NULL value is common, this approach will be a poor performer.
Using a standard value for null.

Technique three is an Oracle specific approach. Oracle supports an unusual index type called a Bitmapped index. This index is specifically designed to perform well for fields with low cardinality. Thus, for Bitmapped indexes, indexing NULL values makes a lot of sense. Bitmapped indexes are the only index type I know that actively indexes NULL values.

8 Responses to “Indexing Nulls”


  1. 1 Peter Geoghegan

    Anthony,

    It’s quite timely for me that you should blog about this, because the latest release of PostgreSQL, 8.3.0, offers support for indexing IS NULL expressions, as well as ORDERing starting or ending with NULL. I think that it was an excellent choice for my project - I’m very glad that I went for it rather than MySQL, which lacks even a Turing-complete procedural languages for stored procs. Had I chosen MySQL, this deficiency would have necessitated data processing within my application, which would have hindered de-coupling my database from my front-end, which is aesthetically ugly, more difficult to maintain and a possible performance bottleneck.

    Getting back to indexing columns with NULLs, I was cognisant of the issues described here early on, so I took your second proposed solution, before that milestone Postgres release made doing so unnecessary (I know its version number does not suggest that it is a milestone, but it is).

    Anyway, nice article. The way you exploited the two columns having the property of being mutually exclusively NULL was clever and novel,

    Regards,
    Peter Geoghegan

  2. 2 anthony geoghegan

    Thanks Peter. Glad to see you’re making progress with your POS application.
    Since ISNULL is deterministic and non agregating its possible to use a computed column index (functional index in Oracle) on it. I hadn’t thought of that ;-). Off hand, I’d say the same cardinality problems could happen if you had it resolving NULLs to a constant value though.

  3. 3 Alan Roche

    Interesting article. The Bitmap index (oracle 10g ) is indeed an option worth considering for low cardinality tables. A word of caution, for online (oltp) applications bitmap indexes can cause major locking problems as oracle needs to lock index blocks for inserts and updates. They are therefore more suitable for OLAP purposes rather than OLTP applications which typically involve frequent updates and inserts.

  4. 4 Matt Turner

    There is another cheat for this (in Oracle at least)..

    Assume we do
    select * from t_person where first_name is null

    and assume further that there are a couple of people with a null first_name.

    If we only index first_name, then this will cause a full scan as you described.

    However there is nothing to stop us creating an index with a composite key (where the second column is not nullable)

    e.g. create index ix_person_fname_id on (first_name, id)

    Then we can happily query
    select * from t_person where first_name is null

    thereby avoiding the full scan.

  5. 5 anthony geoghegan

    That’s an interesting hack Matt. Not one I’ve considered before. But in the interests of sanity I should point out creating composite indexes, where one is not otherwise necessary, is just bad practise. The storage concerns for this method could be very significant depending on the cardinality of the “id” field.

  6. 6 Matt Turner

    To play devil’s advocate - you could consider the code level solution to be a hack one level up as it relies upon a relationship between the two columns which is probably not enforced, nor necessarily guaranteed to remain true in future.. it also has the potential to confuse a developer one day (if the required logic dictates that name should not be null, then it would be a surprise to see something different being queried). It depends on the context, and on your perspective I suppose.

    Any not-null field would do in the above example, but if you don’t have a suitable one (and ID is likely to be a bad choice as you say) then this might be preferable…
    create index ix_person_fname on t_person (first_name, ‘ ‘);

  7. 7 anthony geoghegan

    It is indeed a hack. A very situational one at that. Unfortunately, no standard mechanism for enforcing “either/or” references exists in any common RDBMS, to my knowledge anyway. Any enforcement would be application specific unfortunately.
    It does have the benefit of having no overhead and being very efficient in index usage however.
    When I tested your index creation scriptlet in T-SQL it wouldn’t accept the literal for an indexable column (as I feel is the correct scenario). I don’t have access to an Oracle instance handy at the moment, but I believe that it might have some issues with it as well.

  8. 8 Kevin Downey

    I am quite informed now by the remark you made about is null performing a complete table scan… It makes a lot of sense once you point it out.

    Cool Bananas,

    Kevin

Leave a Reply