locked
Trouble trying to do a simple STIntersects() on some waterways in some zones RRS feed

  • Question

  • Hi folks,

    I've been given a shapefile which contains all the waterways (eg. rivers, creeks, streams, rivers, etc) for some USA States. I've also been given a second file which contains custom boundaries (think custom neighbourhoods or areas) which represent zones the company uses to delegate people/resources to those waterways.

    I've imported them into the system (using my custom import script mentioned at the botton of this post ). Prefect! Two tables with two GEOGRAPHY fields.

    Now, I'm trying to determine which waterway is a part of which custom boundary. Some waterways are a part of multiple boundaries because they are really really long. The output is a string which describes the relationship.

    I'm not sure how to do this in SQL with the STIntersects() method. I can do it with one waterway against all the boundaries, but not multiple waterways against multiple boundaries (i get so confused with my 'set' thinking. I hate cursors and i keep thinking like a cursor .. but I don't create cursor code cause it's evil, especially when you can do it with 'set' code).

    Anyways, here's some sample schema / table data...

    Waterways
    Id INT PK IDENTITY
    Name NVARCHAR(100) NOT NULL
    WaterWaySegment GEOGRAPHY

    Data:
    1. Long River 1, 0xAD.... (Segment 1)
    2. Long River 1, 0xAD.... (Segment 2)
    ..
    104. Long River 1, 0xAD.... (Segment 104)
    105. Woodchip Creek, 0xAD ... (Segment 1)

    WaterZones
    Id INT PK
    Name NVARCHAR(100) NOT NULL
    Zone GEOGRAPHY

    Data:
    1. NW Zone 1
    2. NW Zone 2
    3. NW Zone 3 - Contractors.

    ...

    Results.

    WaterWaysResults
    Id INT PK
    NVarchar(200)
    Segment GEOG

    Data:

    1. Long River 1 (NW Zone 1), 0xAD... (union of parts that intersect that zone)
    2. Long River 1 (NW Zone 2), 0xAD... (union of parts that intersect that zone)
    ..
    61. Wood Chip Creek, (NW ZOne 1), 0xAD... (union of parts that intersect that zone)


    phew.. ! :) 

    Any ideas?


    -Pure Krome-
    • Edited by Pure Krome Monday, July 20, 2009 11:17 PM
    Monday, July 20, 2009 1:57 AM

Answers

  • If you want to group by WaterZone and obtain an Aggregate Union Value for all WaterSegments, you can use
    GeographyUnionAggregate from SQL Spatial Tools. Otherwise you're using a cursor. Use GeographyUnionAggregate with a spatial column like you'd use an "ordinary" aggregate on a non-spatial column (i.e., sum(col1) or avg(col2) or GeographyUnionAggregate(spatialCol)).
     
    For the spatial index, I haven't been following this entire thread, but if you're using STIntersection, that can't use a spatial index. If you're using STIntersects, it will. If you've rewritten the query so you can use a spatial index, I'd start with one on the table with the most rows. Another tack to take is a spatial index on the table with the most complex geographies. You can always check your work with the spatial index analysis procs, or look at the query plans.
    Cheers,
    Bob Beauchemin
    SQLskills
    Wednesday, July 22, 2009 2:03 AM
  • You'd need to look at the query plan for the two operations and see what it's doing. But, that said, the GeometryUnionAggregate is a non-trivial operation and often used by pre-processing the data rather than expecting real-time few second response. That's (preprocessing) what I've used it for, but I have gotten query times in the tens of minutes.
     
    But have a look at the query plan and see if the version with aggregate is using the spatial index and where the time's being spent.
     
    Your alternative is to use a cursor for unioning. And it may not be any faster.
     
    Cheers,
    Bob
    "Pure Krome" wrote in message news:fdc012ca-75b6-491 f-9500-bcf95417fee9...
    Hi Bob,

    I've finally had a chance to get back to this problem. I tried the following ....

    SELECT
      ww.Name AS WaterWay,
      wz.Name AS WaterZone
    FROM
      Waterways ww JOIN WaterZones wz ON 
    ww.WaterWaySegment.STIntersects(ww.Zone) = 1
    GROUP BY ww.Name, wz.Name

    and this gives me some results pretty quickly .. < 1 sec. It also returned XXX number of rows only. Kewl :)

    Now, notice i have NOT added the dbo.GeographyUnionAggregate(..) method, just yet.

    When i add that method, it takes around 1 hour,  1 minute and 31 seconds to run. :(
    SELECT
      ww.Name AS WaterWay,
      wz.Name AS WaterZone,
      dbo.GeographyUnionAggregate(ww.WaterWaySegment, 10) AS SegmentByZone
    FROM
      Waterways ww JOIN WaterZones wz ON 
    ww.WaterWaySegment.STIntersects(ww.Zone) = 1
    GROUP BY ww.Name, wz.Name

    I'm not sure where I should start to debug why this is taking so long. I'm getting a nagging feeling that this could be related to a missing spatial index. But if that was the case, then my first query above would ALSO take the same amount of time (i would have guessed). I would have thought the GeographyUnionAggregate(..) method doesn't effect the retrieval process (ie. what rows to get), but just what is outputted. The data has been retrieved, it just mashes the segments together.

    I also tried to subquery it, to make sure that it's only trying to union the result segments and not try and do some massive query. Time is the same :(

    -Pure Krome-
    Wednesday, July 29, 2009 4:42 AM

All replies

  • SELECT DISTINCT
      ww.Name AS WaterWay,
      wz.Name AS WaterZone,
      ww.WaterWaySegment.STIntersection(ww.Zone) AS Segment
    FROM
      Waterways ww JOIN WaterZones wz ON ww.WaterWaySegment.STIntersects(ww.Zone) = 1;

    I haven't tried the above, by the way, but I think it should give you what you want...
    Beginning Spatial with SQL Server http://www.apress.com/book/view/1430218290
    Monday, July 20, 2009 9:34 AM
    Answerer
  • ww.WaterWaySegment.STIntersection(ww.Zone) AS Segment


    Can you please explain what this is for? I've never seen STIntersection(..).

    (i'm going to have to check your book, when I'm back at work in 8 hours)....

    -Pure Krome-
    Monday, July 20, 2009 2:24 PM
  • - STIntersects() is a test to determine if two instances intersect, i.e. it returns 1 if they intersect or 0 if they don't.
    - STIntersection() actually gives you the geometry created from the intersection of two geometries.

    (Also, looking back at the code I posted earlier, you probably don't want the DISTINCT, but let me know how you get on anyway...)
    Beginning Spatial with SQL Server http://www.apress.com/book/view/1430218290
    Monday, July 20, 2009 2:49 PM
    Answerer
  • Ah gotcha.

    /me tries a few things.

    Kewl. Nearly there. It's returning the correct zone for each water segment .. but I was trying to union all the segments by the same zone. So for my original example (and the one I was just playing with, in my SQL), the first waterway was divided up into 104 unique segments (ie. rows). I then recieved 104 results, with the correct zip code each segment exists within. Instead, I was hoping to group this by WaterZone. But to do that, I need to group by WaterZone and then WaterWay .. but I still fail .. because I have no idea how to group by Segment :)  (which is not right.. I need to UNION the segments, not group by .. as we are dealing with spatial data in this case)...

    Now, I'm thinking i could do this with at least one more query .. but I'm confident this doesn't need to be the case ... I'm sure this can easily be done with a single query.

    Lastly, because we're dealing with TWO spatial fields, i'm not sure where the Spatial Index should go on? The WaterWays? The WaterZones? My gut feeling is the WaterWays because this the main geog we start with, for each row we're trying to interrogate.
    -Pure Krome-
    Monday, July 20, 2009 11:22 PM
  • maybe I need to try this with multiple statements instead .. :( ??
    -Pure Krome-
    Tuesday, July 21, 2009 11:36 PM
  • If you want to group by WaterZone and obtain an Aggregate Union Value for all WaterSegments, you can use
    GeographyUnionAggregate from SQL Spatial Tools. Otherwise you're using a cursor. Use GeographyUnionAggregate with a spatial column like you'd use an "ordinary" aggregate on a non-spatial column (i.e., sum(col1) or avg(col2) or GeographyUnionAggregate(spatialCol)).
     
    For the spatial index, I haven't been following this entire thread, but if you're using STIntersection, that can't use a spatial index. If you're using STIntersects, it will. If you've rewritten the query so you can use a spatial index, I'd start with one on the table with the most rows. Another tack to take is a spatial index on the table with the most complex geographies. You can always check your work with the spatial index analysis procs, or look at the query plans.
    Cheers,
    Bob Beauchemin
    SQLskills
    Wednesday, July 22, 2009 2:03 AM
  • Thanks Bob for the post. I'm sure i've got the previous release of the Sql Spatial tools (for the GeographyUnionAggregate method).. I had my own code to make a geog valid, which I believe was a big kewl thing in the latest release....

    Cheers :) If i get any more probs with this, i'll post em here....

    I'm also really really scared about checking out my
    spatial index analysis procs ... i'm not sure why. Intimidation comes to mind :(

    -Pure Krome-
    Wednesday, July 22, 2009 4:17 AM
  • Hi Bob,

    I've finally had a chance to get back to this problem. I tried the following ....

    SELECT
      ww.Name AS WaterWay,
      wz.Name AS WaterZone
    FROM
      Waterways ww JOIN WaterZones wz ON ww.WaterWaySegment.STIntersects(ww.Zone) = 1
    GROUP BY ww.Name, wz.Name

    and this gives me some results pretty quickly .. < 1 sec. It also returned XXX number of rows only. Kewl :)

    Now, notice i have NOT added the dbo.GeographyUnionAggregate(..) method, just yet.

    When i add that method, it takes around 1 hour,  1 minute and 31 seconds to run. :(
    SELECT
      ww.Name AS WaterWay,
      wz.Name AS WaterZone,
      dbo.GeographyUnionAggregate(ww.WaterWaySegment, 10) AS SegmentByZone
    FROM
      Waterways ww JOIN WaterZones wz ON ww.WaterWaySegment.STIntersects(ww.Zone) = 1
    GROUP BY ww.Name, wz.Name

    I'm not sure where I should start to debug why this is taking so long. I'm getting a nagging feeling that this could be related to a missing spatial index. But if that was the case, then my first query above would ALSO take the same amount of time (i would have guessed). I would have thought the GeographyUnionAggregate(..) method doesn't effect the retrieval process (ie. what rows to get), but just what is outputted. The data has been retrieved, it just mashes the segments together.

    I also tried to subquery it, to make sure that it's only trying to union the result segments and not try and do some massive query. Time is the same :(

    -Pure Krome-
    Wednesday, July 29, 2009 4:06 AM
  • You'd need to look at the query plan for the two operations and see what it's doing. But, that said, the GeometryUnionAggregate is a non-trivial operation and often used by pre-processing the data rather than expecting real-time few second response. That's (preprocessing) what I've used it for, but I have gotten query times in the tens of minutes.
     
    But have a look at the query plan and see if the version with aggregate is using the spatial index and where the time's being spent.
     
    Your alternative is to use a cursor for unioning. And it may not be any faster.
     
    Cheers,
    Bob
    "Pure Krome" wrote in message news:fdc012ca-75b6-491 f-9500-bcf95417fee9...
    Hi Bob,

    I've finally had a chance to get back to this problem. I tried the following ....

    SELECT
      ww.Name AS WaterWay,
      wz.Name AS WaterZone
    FROM
      Waterways ww JOIN WaterZones wz ON 
    ww.WaterWaySegment.STIntersects(ww.Zone) = 1
    GROUP BY ww.Name, wz.Name

    and this gives me some results pretty quickly .. < 1 sec. It also returned XXX number of rows only. Kewl :)

    Now, notice i have NOT added the dbo.GeographyUnionAggregate(..) method, just yet.

    When i add that method, it takes around 1 hour,  1 minute and 31 seconds to run. :(
    SELECT
      ww.Name AS WaterWay,
      wz.Name AS WaterZone,
      dbo.GeographyUnionAggregate(ww.WaterWaySegment, 10) AS SegmentByZone
    FROM
      Waterways ww JOIN WaterZones wz ON 
    ww.WaterWaySegment.STIntersects(ww.Zone) = 1
    GROUP BY ww.Name, wz.Name

    I'm not sure where I should start to debug why this is taking so long. I'm getting a nagging feeling that this could be related to a missing spatial index. But if that was the case, then my first query above would ALSO take the same amount of time (i would have guessed). I would have thought the GeographyUnionAggregate(..) method doesn't effect the retrieval process (ie. what rows to get), but just what is outputted. The data has been retrieved, it just mashes the segments together.

    I also tried to subquery it, to make sure that it's only trying to union the result segments and not try and do some massive query. Time is the same :(

    -Pure Krome-
    Wednesday, July 29, 2009 4:42 AM
  • ...the GeometryUnionAggregate is a non-trivial operation and often used by pre-processing the data rather than expecting real-time few second response.
    Yeah .. exactly! That's why I'm really confused. It shouldn't be hitting the database tables to doing any SELECTS or WHERES, etc.

    I even tried to extract all the segment data into a variable table (which is nearly instant) and THEN do the GeometryUnionAggregate against that result set. .. and guess what?! INSTANT.

    like .. WTF?

    What's frustrating is that, to get the query plan for the original query, i need to wait 1 hour :( ... bleh.

    -Pure Krome-
    Thursday, July 30, 2009 12:39 AM
  • For the hour-long one, you can look at the estimated plan. It's "Display Estimated Plan" off the SSMS Query menu. Doesn't run the query at all.
     
    I'm having trouble figuring out what you're doing in the last sentence. The plan for the long-running query vs the two queries that are instant would likely show where the problem is. And, if worse comes to worse and you can accomplish what you want in two queries (is that what you mean by using a table variable), I'd encapsulate it in a stored proc. But it really does sound like a bad plan. Send me the queries and plans if you want me to take a look.
     
    Cheers,
    Bob
    "Pure Krome" wrote in message news:d80b1950-afca-4a5 0-ae91-9ba7b5413a65...
    ...the GeometryUnionAggregate is a non-trivial operation and often used by pre-processing the data rather than expecting real-time few second response.
    Yeah .. exactly! That's why I'm really confused. It shouldn't be hitting the database tables to doing any SELECTS or WHERES, etc.

    I even tried to extract all the segment data into a variable table (which is nearly instant) and THEN do the GeometryUnionAggregate against that result set. .. and guess what?! INSTANT.

    like .. WTF?

    What's frustrating is that, to get the query plan for the original query, i need to wait 1 hour :( ... bleh.

    -Pure Krome-
    Thursday, July 30, 2009 12:57 AM
  • Hi Bob, I sent you an email to your sqlskills addy. If that's the wrong destination, could u hint at another email to bounce the details to?
    -Pure Krome-
    Friday, July 31, 2009 12:40 AM
  • I agree, there really needs to be some way of sending private messages here....

    Krome - are the datasets you're using publicly available, such as those from the U.S. Census? If so, you could simply tell us the names of the datasets and where you got them from and we can try to recreate your situation ourselves.

    Beginning Spatial with SQL Server http://www.apress.com/book/view/1430218290
    Friday, July 31, 2009 7:02 AM
    Answerer
  • Hi guys. yes they are but i can't display my real stuff publically 'cause of NDA's. I made a camtasia video and zipped it up and sent a nice email to Bob over a day ago. But Bob and you (Tan) are respected (read: trusted) folks so that's kewl by our NDA stuff.

    I can send u the quick few mins vid also Tanoshimi (you're email is in my e-book of your book, i think.. ??? nope ... well at least not on page xxi :) but it's on your website... )

    @Bob. your first part of the email is :bobb@....... right? (for your sql skills addy).


    EDIT: Oh, I can also dump the two tables to a file(s), zip em and throw them up on a website for download, if the sql query to dump to a file is posted .. for you two guys, if your interested :)

    -Pure Krome-
    Friday, July 31, 2009 8:11 AM
  • If you go to http://www.beginningspatial.com/contact you can fill in the contact form with your contact details and/or a link to any relevant files and it will get to me.
    Beginning Spatial with SQL Server http://www.apress.com/book/view/1430218290
    Friday, July 31, 2009 8:47 AM
    Answerer
  • My email is also at the bottom of my latest MSDN article, here, http://msdn.micr osoft.com/en-us/magazine/ee236412.aspx.
     
    But I haven't received any email from you on this subject yet. Maybe try sending email without the attachment first, and I'll response. The files on website is another idea.
     
    The ability not to include attachments is having to copy tons of code into messages (if its even possible) is the frustrating part of forums (as opposed to newsgroups) that I was referring to.
     
    Cheers,
    Bob
    "Pure Krome" wrote in message news:7013b104-1587-419 6-b350-559e217ec7c7...
    Hi guys. yes they are but i can't display my real stuff publically 'cause of NDA's. I made a camtasia video and zipped it up and sent a nice email to Bob over a day ago. But Bob and you (Tan) are respected (read: trusted) folks so that's kewl by our NDA stuff.

    I can send u the quick few mins vid also Tanoshimi (you're email is in my e-book of your book, i think.. ??? nope ... well at least not on page xxi :) but it's on your website... )

    @Bob. your first part of the email is :bobb@....... right? (for your sql skills addy).


    EDIT: Oh, I can also dump the two tables to a file(s), zip em and throw them up on a website for download, if the sql query to dump to a file is posted .. for you two guys, if your interested :)

    -Pure Krome-
    Friday, July 31, 2009 2:44 PM
  • resent :) and yeah .. that was the original place I found your email addy. I read that article a few days ago and noted your email addy, then :)
    -Pure Krome-
    Friday, July 31, 2009 3:23 PM

  • Got it. I was able to reproduce the problem with some different tables. This looks like a problem with the query optimizer overestimating the cost of using the spatial index compared to not using it, partially because the number of rows returned are also overestimated. You need a spatial index hint on your query that contains the aggregate and the group by, to ensure it's used.
     
    Cheers,
    Bob Beauchemin
    SQLskills
    Friday, July 31, 2009 10:49 PM