none
Nullable columns and joins

    Question

  • Not an urgent question, just a curiosity.

    My co-worker told me that any column used in a join should not be nullable, since nullable columns slow down join performance.

    So he wanted to take a column, make it non-nullable and then replace all the null values with "junk" (ie, ' ') so that the queries would be optimaized.

    I know having potential null values changes how you have to write the join, potentially, but is there any hit to performance? Was he right? I couldn't find any articles on this.

    thanks,

    --kevin

    Friday, March 04, 2011 1:42 PM

Answers

  • Kevin, I have not tried on NULL but I can tell you one thing .The query goes for any one of the following :

    Nested Loop join (1 row from table A is compared to all the rows in table B) .Data is not sorted.

    Merge Join (1 row is compared to another Row and if the condition meets , no more comparision). Join columns are sorted.

    Hash join which is based on hashing algorithm (sorting is done after hashing and generally when one table is very small as compared to another)...

    If you have large table and the columns are not sorted it will do a nested loop join (or might be hash join ..not sure though as I have not tested it yet), which will be very costly as compared to merge join .

    From that angle your performance will be hit .Another fact is that a NULL is not zero but the optimizer easily differentiate between a null and not null .However , the optimizer will have issues when there are more null values to compare and hence the performance might get slow (comceptually but I have not yet practicaly tested it) .a blank characher will have some value (not sure what)...

    So if your columns in joins are indexed ,even though if you have nulls you should not have problems , because no one will write queries to looks for nulls (exceptions are always there).


    Abhay Chaudhary OCP 9i, MCTS/MCITP (SQL Server 2005, 2008, 2005 BI) ms-abhay.blogspot.com/
    • Marked as answer by kjhngisd Friday, March 04, 2011 4:51 PM
    Friday, March 04, 2011 4:00 PM
  • Use the nulls.

    This sounds like one of those folk tales, there might have been a case in some antique system where it was true, maybe in dBase IV on a 20mhz 386 processor.  On modern systems, forgetaboutit.  It might take a tiny bit more CPU, or maybe a little less, you don't add extra clauses regarding nulls in the JOIN, perhaps in the WHERE.

    The thing is, using a non-null value means your data takes more space , and that will slow things down - if you have to bring one more page from disk to RAM, you've lost whatever time you might have saved with a simpler clause.

    There are some people who just don't like the concept of null values.

    Josh

     

    • Marked as answer by kjhngisd Friday, March 04, 2011 4:52 PM
    Friday, March 04, 2011 4:10 PM

All replies

  • Kevin, I have not tried on NULL but I can tell you one thing .The query goes for any one of the following :

    Nested Loop join (1 row from table A is compared to all the rows in table B) .Data is not sorted.

    Merge Join (1 row is compared to another Row and if the condition meets , no more comparision). Join columns are sorted.

    Hash join which is based on hashing algorithm (sorting is done after hashing and generally when one table is very small as compared to another)...

    If you have large table and the columns are not sorted it will do a nested loop join (or might be hash join ..not sure though as I have not tested it yet), which will be very costly as compared to merge join .

    From that angle your performance will be hit .Another fact is that a NULL is not zero but the optimizer easily differentiate between a null and not null .However , the optimizer will have issues when there are more null values to compare and hence the performance might get slow (comceptually but I have not yet practicaly tested it) .a blank characher will have some value (not sure what)...

    So if your columns in joins are indexed ,even though if you have nulls you should not have problems , because no one will write queries to looks for nulls (exceptions are always there).


    Abhay Chaudhary OCP 9i, MCTS/MCITP (SQL Server 2005, 2008, 2005 BI) ms-abhay.blogspot.com/
    • Marked as answer by kjhngisd Friday, March 04, 2011 4:51 PM
    Friday, March 04, 2011 4:00 PM
  • Use the nulls.

    This sounds like one of those folk tales, there might have been a case in some antique system where it was true, maybe in dBase IV on a 20mhz 386 processor.  On modern systems, forgetaboutit.  It might take a tiny bit more CPU, or maybe a little less, you don't add extra clauses regarding nulls in the JOIN, perhaps in the WHERE.

    The thing is, using a non-null value means your data takes more space , and that will slow things down - if you have to bring one more page from disk to RAM, you've lost whatever time you might have saved with a simpler clause.

    There are some people who just don't like the concept of null values.

    Josh

     

    • Marked as answer by kjhngisd Friday, March 04, 2011 4:52 PM
    Friday, March 04, 2011 4:10 PM
  • That's kinda what i thought.

    I've noticed that my co-workers seem to have a lot of misconceptions about performance -- like they'll struggle over chopping one byte from a varchar definition (which probably used to matter in old databases), but then mis-index the table, or use a single-threaded loader to load it, rather than a bulk load (because they "don't trust" the bulk load).

    I just thought they may know something about nulls that I didn't.

    thanks.

    --kevin

     

    Friday, March 04, 2011 4:51 PM