locked
Index Performance on TOP Clause RRS feed

  • Question

  • I've got the following query on a table with about 2 million rows:

    	SELECT TOP 2500
    		A.ID,
    		A.Geometry
    	FROM
    		TABLE_1 A WITH(INDEX(IX_TABLE_1_1)) INNER JOIN
    		TABLE_2 B ON B.ID = A.ID
    		
    	WHERE
    		B.Column = 'Condition' AND	
    		A.Geometry.STIntersects(@bounds) = 1	
    	
    	ORDER BY
    		A.ID

    All the appropraite indexes are created on both tables (ID in this case as an example), including a spatial index. You'll notice that I'm using the TOP clause, since my clustered primary key (ID) on TABLE_1 was created on the actual STLength() of the geometry, since I always want the biggest elements rertuned DESC, but restricted. This query is very effective on smaller bounding areas where say 100 geometries are found on a small bounds and get displayed on average under 500 ms.. However, If a specify a much larger bounding area, where 50 000 matches are found, only the TOP 2500 must be displayed, the query runs about 8 seconds.

    My dilemma is that I need an optimal solution always returning only the TOP 2500 records within the bounding rectangle. I find the INDEX HINTING super effective on lower levels, but extremly slow on larger areas. If I remove the hinting clause, the bounds on a larger area executes under 1 second, but then the smaller bouning areas take 8 - 9 seconds :-(.

    It's like I'm stuck with a catch 22. Should I change may GRID_RESOLUTION on LEVEL 1 and 2 to LOW? My current setting (for the spatial index) are MEDIUM on all LEVELS and obviously the smallest bounding area based on the contained data.

    I'll post my execution plan (Statistics) if required.

    Any advice would be appreciated.

    Andries  

    Monday, March 22, 2010 9:24 PM

Answers

  • Hi Andries,

    I've had trouble in the past optimizing by using TOP because the TOP clause doesn't take effect until the filter iterator in the query plan (that's fairly deep into the plan). You can try this without the TOP clause, select the results into a temporary table and do a TOP from the temp table, but with that many results in the original, it may not make that much of a difference. You can send me the query plan (or a link to it) as a .sqlplan file, if you want, for confirmation.

    What your grid resolution should be is a different matter, it depends on whether your A.Geometry column contains points or linestrings or polygons, how complex your linestrings or polygons are (if you have these), and how big your query sample (@bounds) is. The spatial index procs can help in this; I see you're trying to use those in another posting so I'll read that posting too. I've written a blog series on how to read them at http://www.sqlskills.com/blogs/bobb spatial category.

    Finally, I'm curious as to how using STLength can work as a unique value, you're betting that no two geometries will even be the same (or you'll get constraint violations). Does this actually work (ie, will you be able to guarentee different lengths)?

    Cheers, Bob

    • Marked as answer by andrieso Friday, March 26, 2010 7:44 AM
    Friday, March 26, 2010 6:03 AM

All replies

  • Hi Andries,

    I've had trouble in the past optimizing by using TOP because the TOP clause doesn't take effect until the filter iterator in the query plan (that's fairly deep into the plan). You can try this without the TOP clause, select the results into a temporary table and do a TOP from the temp table, but with that many results in the original, it may not make that much of a difference. You can send me the query plan (or a link to it) as a .sqlplan file, if you want, for confirmation.

    What your grid resolution should be is a different matter, it depends on whether your A.Geometry column contains points or linestrings or polygons, how complex your linestrings or polygons are (if you have these), and how big your query sample (@bounds) is. The spatial index procs can help in this; I see you're trying to use those in another posting so I'll read that posting too. I've written a blog series on how to read them at http://www.sqlskills.com/blogs/bobb spatial category.

    Finally, I'm curious as to how using STLength can work as a unique value, you're betting that no two geometries will even be the same (or you'll get constraint violations). Does this actually work (ie, will you be able to guarentee different lengths)?

    Cheers, Bob

    • Marked as answer by andrieso Friday, March 26, 2010 7:44 AM
    Friday, March 26, 2010 6:03 AM
  • Hi Bob,

    Thanks for the feedback. Guess I need to experiment with my queries a bit more. Most of my data is linestrings and polygons, so I guess LEVEL_1 and LEVEL_2 on LOW and LEVEL_3 and LEVEL_4 on HIGH should be ok?

    There is no reason for you to introduce me to your blog, since I've been following your blog for quite some time now! :-) I find it very informative and helpful. So, keep them coming..

    Regarding the sequencing, or sort order of my table based on the actual (or should I say approximate) size, the idea is to create a single primary clustered key (which is a RowID in this case) based on a defined sort order:

    UPDATE
          TABLE_1
          
    SET   
          RowID = Y.RowID
          
    FROM
          TABLE_1 X
          
    INNER JOIN 
    (     
    
          SELECT TOP 100 PERCENT
                ID,
                ROW_NUMBER() OVER (ORDER BY Geometry.STLength() DESC) RowID
                
          FROM 
                TABLE_1
                
    ) Y ON Y.ID = X.ID    

    Cheers,

    Andries

    Friday, March 26, 2010 7:43 AM
  • Ah, I see now... (about the primary key based on STLength).

    Do feel free to post me the query plan (or send it directly in email), but if you have 50000 matches, even with any spatial index its going to do the primary filter using the index (those steps on the right-hand side of the query plan), then do 50000 seeks to retrieve the geometries (spatial index doesn't contain the geometry) before even getting to the secondary filter. It can optimize away the STIntersects if one of the cells completely covers the query sample and the geometry. All this before the TOP clause kicks in.

    Is most of the time in the query plan being spent on the Filter iterator (bottom of the plan, to the mid-left) or the Seek Iterator (after the index matching join step, reading right-to-left)? If it's spent in Seek, messing with the index density may not help as much. You're trying to make the Primary Filter Efficiency and Internal Filter Efficiency as big as possible, in the index diagnostic proc output. The Internal filter hits let it skip the STIntersects (Filter step in plan) entirely if you have the correct density.

    Hope this helps,

    Bob

     

    Friday, March 26, 2010 3:24 PM