locked
Points in polygon spatial index optimization RRS feed

  • Question

  • Hi,

    I'm sorry if this question has already been asked many times but after hours reading posts on the forum I still did not find the answer that would really help me.

    Here is my situation :

    • I have a table with about 8 millions geographic coordinates (points)
    • I have another table which some polygons as geograhics coordinates, let's call them "map zones"
    • Most of the time, one zone is the size of the half of the USA
    • I want to get all the coordinates within a map zone, and I want it to be very fast.

    The problems are :

    • The average query time for one map zone is 23 seconds for 20'000 records returned (which i think is slow because i've heard about under second performance)
    • When I create a spatial index, it is never used
    • If I force the use of the index with a hint, the performance are worst than without using the index :(

     Here is my query :

    SELECT
        m.COORDINATES
    FROM
       MAP_ZONE_COORDINATES c, 
       VSL_MVMT m 	
    WHERE
       c.MAP_ZONE_ID = 5
       AND m.COORDINATES.STIntersects(c.COORDINATES) = 1
    

    I have tried many different index configuration (low, high, cells per objects etc..) but still no good results.

    Thanks for you help !

    Thursday, December 29, 2011 1:24 PM

Answers

  • Hi there,
    • Are you using geography or geometry? If geometry, have you set the size of the bounding box correctly?
    • Which column/table have you tried creating the spatial index on - MAP_ZONE_COORDINATES.COORDINATES or VSL_MVMT.COORDINATES?  (note you can only make use of a single spatial index per query)
    • Have you got any NULL values in the COORDINATES column?
    • Try using m.COORDINATES.Filter(c.COORDINATES) = 1  instead of   m.COORDINATES.STIntersects(c.COORDINATES) = 1
      - any better?
    • Can you dump the output of sp_help_spatial_geometry_index here with @query_sample set to a typical map zone polygon from your table
    • I get sub-second performance on a spatial query against a table of 40 million items, but that's in queries when the geographic extent of the query window is very small relative to the overall extent of the data (in my case, I've got data covering the whole of the UK, but a typical query window might only be, say 400 metres square, and return less than 1,000 rows). Clearly, as the size of the window increases, the query becomes much less selective and you'd expect it to take longer...

     

     


    twitter: @alastaira blog: http://alastaira.wordpress.com/
    • Marked as answer by Kryptus Monday, January 16, 2012 10:29 AM
    Thursday, December 29, 2011 8:01 PM
    Answerer
  • About how to figure out why your spatial index is not being used, have you followed the guidelines in:
     
    Null values could be an issue. You’ve already seen the link for the problems it can cause http://www.sqlskills.com/BLOGS/BOBB/post/Be-careful-with-EMPTYNULL-values-and-spatial-indexes.aspx, and you can check your query plan to see if you are getting 2 million extra seeks back to the base table when you try the spatial index. Or if you’re getting a base table scan rather than seeks because of it. Don’t know the current status of that issue, the “last ditch” workaround was pretty ugly. Just so you know, you mention NULL in the table, its only NULL values in the spatial column that matter. If you can see that your plan indicates this problem, another “last ditch" workaround might be to separate the table into 2, (1 table with NULL values only, 1 without), index the table with no NULL values, and use it only for spatial queries, use a view of the two tables with UNION ALL for all other queries. You can’t put a spatial index on a view or use the “filtered” spatial index.
     
    That being said, if you really do return 20000-30000 rows from the query and NULLs aren’t what’s causing you grief (because you’ve verified this through the query plan), using the spatial index will cause one base table seek for each candidate row. So response may not be subsecond. You could also check to see if parallelism is being used in the plan.
     
    Hope this helps,
    Cheers,
    Bob
    • Proposed as answer by Peja Tao Monday, January 2, 2012 2:49 AM
    • Marked as answer by Kryptus Monday, January 16, 2012 10:29 AM
    Friday, December 30, 2011 5:30 PM

All replies

  • Hi there,
    • Are you using geography or geometry? If geometry, have you set the size of the bounding box correctly?
    • Which column/table have you tried creating the spatial index on - MAP_ZONE_COORDINATES.COORDINATES or VSL_MVMT.COORDINATES?  (note you can only make use of a single spatial index per query)
    • Have you got any NULL values in the COORDINATES column?
    • Try using m.COORDINATES.Filter(c.COORDINATES) = 1  instead of   m.COORDINATES.STIntersects(c.COORDINATES) = 1
      - any better?
    • Can you dump the output of sp_help_spatial_geometry_index here with @query_sample set to a typical map zone polygon from your table
    • I get sub-second performance on a spatial query against a table of 40 million items, but that's in queries when the geographic extent of the query window is very small relative to the overall extent of the data (in my case, I've got data covering the whole of the UK, but a typical query window might only be, say 400 metres square, and return less than 1,000 rows). Clearly, as the size of the window increases, the query becomes much less selective and you'd expect it to take longer...

     

     


    twitter: @alastaira blog: http://alastaira.wordpress.com/
    • Marked as answer by Kryptus Monday, January 16, 2012 10:29 AM
    Thursday, December 29, 2011 8:01 PM
    Answerer
  • Hi,

    Thank you for answering quickly. Here are the informations you've asked :

    • I am using geography
    • I have tried creating an index on both tables, on the COORDINATES column (both tables have only one
      coordinates column of type geography).
    • I have NULL values in the VSL_MVMT table (about 2 millions out of 8 millions records).
    • I have tried with Filter and STIntersect, the query executes in exactly the same time.
    • Here is the dump of the sp_help_spatial_geography_index

      All levels = medium, Cells per object = 1
      Total_Number_Of_ObjectCells_In_Level0_For_QuerySample 1
      Total_Number_Of_ObjectCells_In_Level0_In_Index 1098549
      Total_Number_Of_ObjectCells_In_Level1_In_Index 41130
      Total_Number_Of_ObjectCells_In_Level2_For_QuerySample 159
      Total_Number_Of_ObjectCells_In_Level2_In_Index 12081
      Total_Number_Of_ObjectCells_In_Level3_For_QuerySample 846
      Total_Number_Of_ObjectCells_In_Level3_In_Index 10344
      Total_Number_Of_ObjectCells_In_Level4_For_QuerySample 7
      Total_Number_Of_ObjectCells_In_Level4_In_Index 3173903
      Total_Number_Of_Interior_ObjectCells_In_Level2_For_QuerySample 104
      Total_Number_Of_Interior_ObjectCells_In_Level3_For_QuerySample 608
      Total_Number_Of_Intersecting_ObjectCells_In_Level1_In_Index 30096
      Total_Number_Of_Intersecting_ObjectCells_In_Level2_For_QuerySample 55
      Total_Number_Of_Intersecting_ObjectCells_In_Level2_In_Index 12081
      Total_Number_Of_Intersecting_ObjectCells_In_Level3_For_QuerySample 238
      Total_Number_Of_Intersecting_ObjectCells_In_Level3_In_Index 10344
      Total_Number_Of_Intersecting_ObjectCells_In_Level4_For_QuerySample 4
      Total_Number_Of_Intersecting_ObjectCells_In_Level4_In_Index 3173903
      Total_Number_Of_Border_ObjectCells_In_Level0_For_QuerySample 1
      Total_Number_Of_Border_ObjectCells_In_Level0_In_Index 1098549
      Total_Number_Of_Border_ObjectCells_In_Level1_In_Index 11034
      Total_Number_Of_Border_ObjectCells_In_Level4_For_QuerySample 3
      Number_Of_Rows_Selected_By_Primary_Filter 1123738
      Number_Of_Rows_Selected_By_Internal_Filter 17144
      Number_Of_Times_Secondary_Filter_Is_Called 1106594
      Number_Of_Rows_Output 20365
      Percentage_Of_Rows_NotSelected_By_Primary_Filter 93.50148217
      Percentage_Of_Primary_Filter_Rows_Selected_By_Internal_Filter 1.525622521
      Internal_Filter_Efficiency 84.18364842
      Primary_Filter_Efficiency 1.81225517

    Could you also help me figure out why the index is not used and why the performances are worst when using the index than without ?

    Thanks a lot !


    • Edited by Kryptus Friday, December 30, 2011 11:22 AM
    Friday, December 30, 2011 11:21 AM
  • About how to figure out why your spatial index is not being used, have you followed the guidelines in:
     
    Null values could be an issue. You’ve already seen the link for the problems it can cause http://www.sqlskills.com/BLOGS/BOBB/post/Be-careful-with-EMPTYNULL-values-and-spatial-indexes.aspx, and you can check your query plan to see if you are getting 2 million extra seeks back to the base table when you try the spatial index. Or if you’re getting a base table scan rather than seeks because of it. Don’t know the current status of that issue, the “last ditch” workaround was pretty ugly. Just so you know, you mention NULL in the table, its only NULL values in the spatial column that matter. If you can see that your plan indicates this problem, another “last ditch" workaround might be to separate the table into 2, (1 table with NULL values only, 1 without), index the table with no NULL values, and use it only for spatial queries, use a view of the two tables with UNION ALL for all other queries. You can’t put a spatial index on a view or use the “filtered” spatial index.
     
    That being said, if you really do return 20000-30000 rows from the query and NULLs aren’t what’s causing you grief (because you’ve verified this through the query plan), using the spatial index will cause one base table seek for each candidate row. So response may not be subsecond. You could also check to see if parallelism is being used in the plan.
     
    Hope this helps,
    Cheers,
    Bob
    • Proposed as answer by Peja Tao Monday, January 2, 2012 2:49 AM
    • Marked as answer by Kryptus Monday, January 16, 2012 10:29 AM
    Friday, December 30, 2011 5:30 PM
  • Hi,

    Alright, after days of testing, I'll first clarify the result of the usefull suggestions above and then go to the conclusions.

    I have put a hint to force the use of the index because I think it is the more elegant solution. The fact is that when I force the use of the index with geography data type, it only gets slower than without the index.

    I have moved all my data from geography to geometry, to see if there was a difference.

    The result is that the geometry data type give me the results in less than 2 sec (average of 1.5) regardless of the amout of data I query (I'm using a date range as a first filter). 1 week or 2 years is giving me same performance even one week is 15'000 rows to process and one year is around 2'000'000.

    I tried to use the STWithin, STContains and STIntersects... no noticable performance difference.

    So my conclusion is that the spatial index is only efficient and usefull on the geometry data type and not on geography.
    Hopefully the geometry is exactly what I need but I am still wondering why it is not working on geography. So if you have any suggestion or if you experience the same behavior, let me know I'll be happy to test and make the geographic index work as efficiently as the geometry one.

    Cheers

    Friday, January 6, 2012 10:13 AM