locked
Speed up "point intesects polygon" queries RRS feed

  • General discussion

  • Hi all,

    I wrote a CLR function that sped up "Point.STIntersects(Polygon)" type queries in SQL server (and oracle/postgres when using the same algorithm) by x10-100 times a few years ago. It auto-tesselates the polygon into shapes that can be processed much quicker by the db engine (fewer false positives, maximises work done by the first pass filter). It works from 2008R2 through to 2016.

    I only recently got around to packaging it up and writing some doco so thought I'd share it as an open beta test if anyone is interested? I've never seen it perform worse than using the standard SQL server spatial queries, nor have I gotten inconsistent results compared to the standard spatial queries.

    in terms of performance improvements (this is using geonames for the points and US census data for the state polygons. Queries are just SELECT COUNT(*) FROM Points INNER JOIN Polygons ON Point.STIntersects(Polygon) = 1 WHERE <use case>)

    Query Times

    Test Case SQL Server (2012) PostGRES Oracle

    Original

    Optimised

    Original

    Optimised

    Original

    Optimised

    Alaska (34k points returned)

    40 sec

    1.2 sec

    2.8 sec

    0.4 sec

    10.8 sec

    2.2 sec

    Colorado (49k points returned)

    21 sec

    1.8 sec

    2.2 sec

    0.8 sec

    5 sec

    2.1 sec

    C% (190k points returned)

    2:02 min

    5.8 sec

    12 sec

    2.9 sec

    1:10 min

    6.5 sec

    All (2.1m points returned)

    1:14:58 hr

    48 sec

    5:30 min

    34 sec

    50:26 min

    1:20 min

    Test Case SQL Server (2016)

    Original
    (1st run)

    Optimised (1st run)

    Original (2nd+ run)

    Optimised (2nd + run)

    Alaska (34k points returned)

    21 sec

    3.2 sec

    13.9 sec

    1 sec

    Colorado (49k points returned)

    13.6 sec

    9 sec

    9.3 sec

    0.8 sec

    C% (190k points returned)

    45.3 sec

    7.6 sec

    38.3 sec

    2.9 sec

    All (2.1m points returned)

    8:27 min

    27.1 sec

    8:24 min

    16 sec

    It's available to download here - http://simplifyspatial.hejsolutions.com/

    I'd be interested in seeing if there's any use cases where it's worse or wrong. I don't do much spatial work anymore so don't get the opportunity to test it out

    Thanks


    Jakub @ Adelaide, Australia Blog


    • Edited by jakubk Friday, September 8, 2017 2:20 AM
    Friday, September 8, 2017 2:19 AM

All replies

  • Jakub:

    I visited your site and your approach is interesting. If I understand it correctly, you are basically instantiating database objects to mimic feature tessellation much the same way as the database can do, provided the spatial index is configured optimally.

    When I began managing spatial data using the SQL Server GEOMETRY data type, I spent a lot of time testing spatial index creation. Ours is a GIS shop and we manage hundreds of spatial data layers in SQL Server, so getting the right mix of configurations for data was an important objective.

    What we developed is a kind of "rule of thumb" for data layers, based on geometry level (point, line or polygon - we strictly limit each table to one kind of geometry) and number of features (rows).

    Here is a summary of what we came up with (all indices include a BOUNDING_BOX definition). For GEOGRAPHY data type, this might not be as useful, but I am sure that index configuration is bound to be helpful.

    (1) For all point data sets (0 dimensions, just location): Grids = (HIGH, HIGH, HIGH, HIGH)

    (2) For all line data sets and polygon data sets with fewer than 50,000 rows: Grids = (LOW, MEDIUM, HIGH, HIGH) and CELLS_PER_OBJECT = 8192

    (3) For polygon data sets with 50K or more rows: USING GEOMETRY_AUTO_GRID and CELLS_PER_OBJECT = 20

    I am sure that this is not optimal in every case, but it serves our needs pretty well. This reply is not a counter to your approach - it is a very interesting and I think probably very effective way of dealing with data sets, especially at very small scales (large areas - like Continental USA).

    GEOGRAPHY data requires a different approach than GEOMETRY in many cases. We use GEOMETRY because we are limited to a smallish area (a city limits, plus a buffer), so we can leverage the advantages available in GEOMETRY spatial indices.

    Thanks for your post. I know it was almost a year ago, but I stumbled on it looking for something else and it caught my eye.

    Good luck!

    Wednesday, May 2, 2018 5:11 PM
  • Glad someone is getting some use out of it even if it's just the theory

    And to answer your question - kind of. I'm doing a custom tesselation of each polygon (the number of objects it becomes is relative to how complex the input geometry is) and each tesselated object is passed to the sql engine as a polygon

    the benefit is two-fold

    1. i reduce the amount of .filter whitespace false positives that the sql engine needs process in the 2nd pass filter of .stintersects
    2. the simpler sub-polygons result in a query plan that is optimised for parallelism

    Have you tried my .assembly? i'd be interested in how it fares against a well thought out indexing strategy

    Jakub @ Adelaide, Australia Blog

    Thursday, May 3, 2018 9:03 AM