locked
Spatial Index Questions RRS feed

  • Question

  • All, I have two questions.

    Q1: I have a table, say PopTable, with huge amount of data. Basically cut a whole country in to small grids (with a LatLng) and disaggregated population data into each little grid (or point in fact). This way, I can get the population of a given polygon. I created the Spatial index with the default settings. It works okay but I like to know what is the best setting (for fast query result). The geography column is just LatLngs.

     

    Q2: I have another table, MarketBoundaries.  The geography column contains polygons of the Boundaries. Some are small (those in central business area), but some are very big. The polygon also contains lots of points (Due to the source data we got). I want to find the market by a point (LatLng). What is the best index setting on the polygon (for fast query).  Any limitation? I saw some posting suggesting multiple index on the polygon column.  I am also think to store the four bounds and create my own non-spatial index.

     

    Thanks.

     

    John

     

    Monday, February 7, 2011 4:26 PM

Answers

  • Q John Chen.

    It looks like you're expecting a magic answer - "Set the index to H,L,M,M, with 27 CELLS_PER_OBJECT and all your queries will give the fastest result".

    Unfortunately, it just doesn't work like that.

    Even if it did, you haven't given anything like enough information for anybody else to help you. Let's start with the basics -

    • One of your table has a "huge" amount of data - is this measured in gigabytes, terabytes, petabytes?
    • And the other table contains some "small" polygons (triangles?) and some polygons with "lots of points" (1,000? 100,000? 100,000,000?)
    • Your first table has a geography column of "just LatLngs".... I assume you mean Point instances?
    • What query time are you getting at the moment? What time do you hope to achieve?

    Even after you've answered those questions, you've still only considered half the information required to optimise an index. Let's assume that the query that you want to run looks something like:

    SELECT * FROM TABLE WHERE GeogColumn.STIntersects(@g) = 1

    So far, you've only thought about the data stored in the GeogColumn of the table. But, in order to use a spatial index in this query, you compare the grid cells occupied by each item in the GeogColumn with the grid cells occupied by @g. In other words, to optimise a spatial you need to have consideration for both sides of the query predicate. Just because your table contains small polygons - the "best" index might differ depending on whether you are testing which of those small polygons intersect a point, or which ones intersect a very large, complex polygon - in order for a primary filter to be applied they must both be tesselated to the same index grid.

    So, once you've identified a typical query sample (@g, in the example above) then use the sp_help_spatial_geography_index function to test how effective your index is. You're trying to maximise the primary and internal efficiency. See http://msdn.microsoft.com/en-us/library/cc627411.aspx

    Then, I recommend creating new indexes, modifying the resolution of just one level of the grid at a time to see the effect. In some cases, the default M,M,M,M is the best setting. Moving down to L grids creates "loose-fitting" cells containing too many false positives, while moving up to H means that the CELLS_PER_OBJECT limit is blown and objects aren't fully-tesselated. And increasing the CELLS_PER_OBJECT limit just means your index grows too large and unwieldy.

    Creating multiple/non-spatial indexes is a good idea. Spatial indexes are difficult to get right, and even when working at their best are not as effective as other "traditional" indexes at limiting data. Just because you're working with spatial data doesn't mean you shouldn't also filter your data based on any other attribute columns you may have, and doing simple numeric conditions on bounding box parameters may well be effective (e.g. WHERE BoundingBoxX1 > 1.26 AND BoundingBoxX2 < 1.28 AND BoundingBoxY1 > 52 AND BoundingBoxY2 < 53)

     


    Beginning Spatial with SQL Server http://www.apress.com/book/view/1430218290
    • Marked as answer by WeiLin Qiao Monday, February 14, 2011 11:45 AM
    Monday, February 7, 2011 5:37 PM
    Answerer