locked
Performance on nullable geography column with spatial index RRS feed

  • Question

  • I've asked this question today on StackOverflow but without result : http://stackoverflow.com/questions/4954875/sql-server-2008-performance-on-nullable-geography-column-with-spatial-index


    I'm seeing some strange performance issues on SQL Server 2008 with a nullable geography column with a spatial index. Each null value is stored as a root node within the spatial index.

    E.g. A table with 5 000 000 addresses where 4 000 000 has a coordinate stored.
    Every time I query the index I have to scan through every root node, meaning I have to scan through 1 000 001 level 0 nodes. (1 root node for all the valid coordinates + 1M nulls)

    I cannot find this mentioned in the documentation, and I cannot see why SQL allows this column to be nullable if the indexing is unable to handle it.

    For now I have bypassed this by storing only the existing coordinates in a separate table, but I would like to know what is the best practice here?

    Thursday, February 10, 2011 3:54 PM

Answers

  • Ok, it seems that I can recreate that behaviour here too. And it doesn't appear to help whether you set values to null or POINT EMPTY - both get added as level 0 cells.

    What about if you explicitly exclude the nulls as and additional condition of your query, as follows?:

    SELECT *
    FROM Table
    WHERE geog IS NOT NULL
    AND geog.STIntersects(@g) = 1;
    Beginning Spatial with SQL Server http://www.apress.com/book/view/1430218290
    • Marked as answer by rundkaas Thursday, February 10, 2011 7:20 PM
    Thursday, February 10, 2011 5:11 PM
    Answerer

All replies

  • I'm sure that this has been discussed a while ago, but I can't now find the link.

    But, according to http://technet.microsoft.com/en-us/library/cc627425.aspx (emphasis added):

    "NULL and empty instances are counted at level 0 but will not impact performance . Level 0 will have as many cells as NULL and empty instances at the base table. For geography indexes, level 0 will have as many cells as NULL and empty instances +1 cell. "

    Can you describe in any more detail the "strange" performance issues that are occurring?


    Beginning Spatial with SQL Server http://www.apress.com/book/view/1430218290
    Thursday, February 10, 2011 4:19 PM
    Answerer
  • Well according to that documentation there should not be any performance impact, but that did not happen in my test cases.

    Both in execution time, and Execution Plan were affected.

    Execution time was drastically longer, and the execution plan showed that sql server had to lookup each nullable node for each search.

     

    Btw Great book about Spatial in SQL server!

    Thursday, February 10, 2011 4:41 PM
  • Ok, it seems that I can recreate that behaviour here too. And it doesn't appear to help whether you set values to null or POINT EMPTY - both get added as level 0 cells.

    What about if you explicitly exclude the nulls as and additional condition of your query, as follows?:

    SELECT *
    FROM Table
    WHERE geog IS NOT NULL
    AND geog.STIntersects(@g) = 1;
    Beginning Spatial with SQL Server http://www.apress.com/book/view/1430218290
    • Marked as answer by rundkaas Thursday, February 10, 2011 7:20 PM
    Thursday, February 10, 2011 5:11 PM
    Answerer
  • Just found this post by Bob Beauchemin which describes exactly this issue (and also explains why my suggestion above might not work):

    http://www.sqlskills.com/BLOGS/BOBB/post/Be-careful-with-EMPTYNULL-values-and-spatial-indexes.aspx

    Looks like the technet documentation is wrong - nulls certainly do affect performance, since they get added as Level0 cells and therefore must all be tested at the secondary filter stage - if you've got lots of nulls that's going to slow down your performance a lot.

    His suggestion is to substitute a dummy value in place of null nodes (he suggests the north pole at (0 90) - obviously if you're using the geometry datatype you can use something else like (999999 9999999) - anything that is outside the range of your data to make it clear that this is definitely not a "real" data item, but that is included in the level 1 range of the bounding box of your index.

    This method is a bit hacky, but it does mean that these nodes should get excluded at the primary filter stage rather than at the secondary stage, and your queries should be accordingly faster. The alternative is to change your table structure, as you have done, to ensure that all the populated values are in a separate table from the nulls...


    Beginning Spatial with SQL Server http://www.apress.com/book/view/1430218290
    Thursday, February 10, 2011 5:22 PM
    Answerer
  • Thanks, that post confirms everything I have experienced. I must say it's disappointing that it's behaving like that. 
    I would say that this is design or implementation flaw, but the documentation is infact 100% correct.

    The documentation you posted was about geometry


    "NULL and empty instances are counted at level 0 but will not impact performance. Level 0 will have as many cells as NULL and empty instances at the base table."

    However for geography, no promise is given about performance: 

    For geography indexes, level 0 will have as many cells as NULL and empty instances +1 cell, because the query sample is counted as 1."

    Anyway, case closed, thanks for the help. :-)

    Thursday, February 10, 2011 7:19 PM
  • I was unable to exclude null values in my query, and I'm not sure if it could work.

    I does appear that any NULL values will poison the index with 0-level nodes, and no matter what I do I can't avoid having to scan through every one.

    Friday, February 11, 2011 7:52 AM