Keeping the polygon point count down. RRS feed

  • Question

  • Hi folks,

       I'm under the (false?) impression that doing spatial queries (eg. STIntersects, etc) against a table of polygons would be more efficient if the number of polygons (per row) is low .. meaning .. not complex polys makes it quicker to do stuff.

    Assuming this is correct, i've done the following to find my offending poly's :-

    SELECT Id, FormattedAddress, ShapeFile.STNumPoints(), ShapeFile  
    FROM tblUS_Neighbourhoods  
    ORDER BY by ShapeFile.STNumPoints() DESC 

    Works great. Out of my 12K odd rows, the top 9000 contain more than 30 points per poly. the worst offender (result #1) has 15218 poly's to define the semi simple shape.

    As such, i'm wondering if I should use the Reduce() method to cut back the point count? Having a play with random rows, i can get some pretty close result shapes after i Reduce ... but i'm not sure what the Tolerance argument represents? I mean, for that first row with 15K points, i put in 10, or 50 or 100 and i get some massively reduced point counts while the shape is relatively similar to the original (of course there will be some 'compression') but what do these numbers mean?

    also, is there a sorta nice point number per poly vs row number. So, if i have 12K odd shapes, should i try and keep them under the 50 points per poly, mark to get some nice speed .. assuming i'm onto somethign with this speed vs shape file complexity stuff.

    cheers :)

    (PS. i can post images if people are curious. Nope they don't compare / beat to the random poly art posted in the other thread :P )

    -Pure Krome-
    Friday, February 6, 2009 6:00 AM

All replies

  • You're correct that performing operations on shapes with less points is generally faster than the same operation performed on a more complex shape. Deciding the appropriate resolution with which to define your shapes is therefore a trade-off of accuracy -v- performance.

    There's little point calling Reduce() at run-time to calculate a simpler shape and then call STIntersects() on that simpler shape, since the Reduce() method itself will require processing time.
    However, you could use Reduce() to pre-calculate simpler versions of all your shapes and store them in a seperate column. When you only want approximate calculations, you could use this simpler column, and when you want exact results you could use the original column. (Of course, if you're calling operations that use a spatial index (STIntersects(), STDistance <= x, etc.), then the primary filter should use the index first anyway - it's only the secondary filter that will use the complex original shape.)

    The only general rule I can think of is to make sure that your shapes have sufficient detail for the application in question, but don't store unnecessary points (i.e. you don't need millimetre-accurate shapes if your application is only to allow customers to work out where their closest pizza delivery firm is....) - deciding what the correct level of detail is up to you.
    Friday, February 6, 2009 9:16 AM
  • Hi Pure,

    You might speed things up a bit by reducing them, but it's not clear how much given the index.  I'd encourage you to try it out and let us know what difference you see.

    I can offer a little information on the tolerance question.  Reduce uses a pretty simple iterative algorithm called Douglass-Peucker.  If we consider a linestring, DP first creates a running result that is a two-point linestring with the same start and end point as the input.  It then finds the point in the input that lies furthest from the result.  If the distance is greater than the tolerance, the point is added to the result and the procedure iterates again.  If not, then the algorithm quits and returns the result it has.

    The upshot is that no point in the result will lie more than the tolerance from the original result; if this weren't true, then the algorithm would have continued.


    Isaac Kunen, Microsoft SQL Server
    Friday, February 6, 2009 11:58 PM