locked
Aggregating Geography RRS feed

  • Question

  • Hello!

    Im trying to aggregate some geography objects (polygons).
    Here`s two ways to do it:


    1) T-SQL :


    declare @g geography
    -- just first row from the table for example
    select @g = geoinfo from border where VID = 3688
    
    select @g = @g.STUnion(b.geoinfo) from border b where b.VID < 3900
    
    select @g
    


    2) Creating .NET assembly and aggregate:



    namespace SqlGeoTools
    {
      [Serializable, StructLayout(LayoutKind.Sequential),
        SqlUserDefinedAggregate(Format.UserDefined,
        IsInvariantToDuplicates = true,
        IsInvariantToNulls = true,
        IsInvariantToOrder = true,
        IsNullIfEmpty = false,
        MaxByteSize = -1)]
      public class GeoUnion : IBinarySerialize
      {
        private SqlGeography val;
        public void Init()
        {
          val = new SqlGeography();
        }
        public void Accumulate(SqlGeography Value)
        {
          if (val == null)
            val = new SqlGeography();
          if (!Value.IsNull)
            val = val.STUnion(Value);
        }
        public void Merge(GeoUnion Group)
        {
          val = val.STUnion(Group.val);
        }
        public SqlGeography Terminate()
        {
          return val;
        }
        public void Read(System.IO.BinaryReader r)
        {
          if (val == null)
            val = new SqlGeography();
          this.val.Read(r);
        }
        public void Write(System.IO.BinaryWriter w)
        {
          if (val == null)
            val = new SqlGeography();
          val.Write(w);
        }
      }
    }
    
    with
    sp_configure 'clr enabled', 1
    go
    reconfigure
    go
    
    create assembly SQLGeoTools from 'D:\pr\sqlGeography\SqlGeoFunctions\bin\Debug\SqlGeoFunctions.dll' go
    
    create aggregate GeographyUnion(@geog geography) returns geography
    external name SqlGeoTools.[SqlGeoTools.GeoUnion]
    
    
    Anything is fine until I want to aggregate too many objects.
    The script takes about two minutes to complete for 1000 geo objects.
    2k objects - cant wait so long. :(

    How can I improve my code / server settings / smth else ? 



    Tuesday, September 22, 2009 5:37 AM

Answers

  • I solved my problem with SqlGeographyBuilder.

        public static SqlGeography Unite(this IEnumerable<SqlGeography> collection, int srid)
        {
          SqlGeographyBuilder builder = new SqlGeographyBuilder();
          builder.SetSrid(srid);
          builder.BeginGeography(OpenGisGeographyType.GeometryCollection);
          foreach (var current in collection)
          {
            switch (current.STGeometryType().Value)
            {
              case "Point":
                UniteWithPoint(ref builder, current);
                break;
              case "MultiPoint":
                UniteWithMultiPoint(ref builder, current);
                break;
              case "Polygon":
                UniteWithPolygon(ref builder, current);
                break;
              case "MultiPolygon":
                UniteWithMultiPolygon(ref builder, current);
                break;
              case "LineString":
                UniteWithLine(ref builder, current);
                break;
              case "MultiLineString":
                UniteWithMultiLineString(ref builder, current);
                break;
              case "GeometryCollection":
                // do smth smart
                throw new NotImplementedException();
                break;
              default:
                throw new InvalidOperationException("Unknown geography type: " + current.STGeometryType().Value);
                break;
            }
    
          }
          builder.EndGeography();
          return builder.ConstructedGeography;
        }


    and realization for concrete geometries like this:



        private static void UniteWithPolygon(ref SqlGeographyBuilder b, SqlGeography current)
        {
          b.BeginGeography(OpenGisGeographyType.Polygon);
          for (int r = 1; r < current.NumRings().Value +1; r++)
          {
            SqlGeography ring = current.RingN(r);
            b.BeginFigure(ring.STStartPoint().Lat.Value, ring.STStartPoint().Long.Value);
            for (int i = 2; i < ring.STNumPoints() + 1; i++)
              b.AddLine(ring.STPointN(i).Lat.Value, ring.STPointN(i).Long.Value);
            if (ring.STStartPoint().Lat.Value != ring.STEndPoint().Lat.Value ||
              ring.STStartPoint().Long.Value != ring.STEndPoint().Long.Value)
              b.AddLine(ring.STEndPoint().Lat.Value, ring.STEndPoint().Long.Value);
            b.EndFigure();
          }
          b.EndGeography();
        }  

    This method takes about 2 secs to complete for 40k objects (polygons in my case). This is quite well.

    Except one thing:

    declare @g1 geography = geography::STGeomFromText('POLYGON((59 81, 60 81, 60 82, 59 82, 59 81))', 4326)
    declare @g2 geography = geography::STGeomFromText('POLYGON((60 81, 61 81, 61 82, 60 82, 60 81))', 4326)
    
    select @g1.STUnion(@g2).STAsText()
    STUnion gives us 1 polygon from this code.
    My method returns GeoCollection. :(

    Tuesday, September 29, 2009 6:06 AM
  • This method takes about 2 secs to complete for 40k objects (polygons in my case). This is quite well.

    Except one thing:

    declare @g1 geography = geography::STGeomFromText('POLYGON((59 81, 60 81, 60 82, 59 82, 59 81))', 4326)
    declare @g2 geography = geography::STGeomFromText('POLYGON((60 81, 61 81, 61 82, 60 82, 60 81))', 4326)
    
    select @g1.STUnion(@g2).STAsText()
    
    STUnion gives us 1 polygon from this code.
    My method returns GeoCollection. :(

    Well I solved this too by using STBuffer(x)  method (or BufferWithTolerance).
    'x' depends on needed presence.

    So If you need 1 polygon instead of geometry collection, you have to just use this method.

    BTW it also helps to remove small 'holes' which has appeares after using STUnion() method and looks like thin lines between 2 geography objects.


    Now, It may be marked as answered topic :)

    Monday, October 5, 2009 7:04 AM

All replies

  • There is a GeographyUnionAggregate function available from the SQLSpatialTools project on Codeplex (http://sqlspatialtools.codeplex.com/)
    I'm not sure if it will yield any better performance than your existing approach, but you could give it a try.

    What is the reason for trying to aggregate 2,000 polygons on the fly? Is there some way you could have predefined aggregates of those polygons that could be used? How about if you were to simplify your polygons - that should make them aggregate faster...
    Beginning Spatial with SQL Server http://www.apress.com/book/view/1430218290
    Tuesday, September 22, 2009 6:50 AM
    Answerer
  • There is a GeographyUnionAggregate function available from the SQLSpatialTools project on Codeplex (http://sqlspatialtools.codeplex.com/)
    I'm not sure if it will yield any better performance than your existing approach, but you could give it a try.

    What is the reason for trying to aggregate 2,000 polygons on the fly? Is there some way you could have predefined aggregates of those polygons that could be used? How about if you were to simplify your polygons - that should make them aggregate faster...
    Beginning Spatial with SQL Server http://www.apress.com/book/view/1430218290
    SQLSpatialTools has same perfomance.

    There`s some tasks which requires to aggregate alot of geo objects. 
    Tuesday, September 22, 2009 7:33 AM
  • Well, in that case, I think you're kind of stuck with regards to refining your code... what kind of server are you using? can you throw some more hardware at it?
    Beginning Spatial with SQL Server http://www.apress.com/book/view/1430218290
    Tuesday, September 22, 2009 8:02 AM
    Answerer
  • Well, in that case, I think you're kind of stuck with regards to refining your code... what kind of server are you using? can you throw some more hardware at it?
    Beginning Spatial with SQL Server http://www.apress.com/book/view/1430218290
    Actually doesnt matter and I wonder about it.
    Local DB (P4 Dual 2GB RAM) or  Server' (2xXeon with 6GB RAM).
    Tuesday, September 22, 2009 8:09 AM
  • execution time per objects graph:

    http://img190.imageshack.us/img190/485/grapha.png

    awesome... 
    Tuesday, September 22, 2009 11:46 AM
  • I found that converted to MapXtreme' (SDK from Pitney Bowes) geo types works atleast in 10-20 times faster.
    If anyone interested let me know with email.

    p.s.: 15 secs for 4000 objects (including conversion from sql geography type).

    p.p.s.: If there will be sql+.net solution for this problem let me know here :)

    p.p.p.s: Now the problem is that MapXtreme uses unmanaged code. 
    Wednesday, September 23, 2009 5:59 AM
  • Well best enchance I found yet is paging geo objects like this:

    set nocount on
    declare  @pageNumber int, @pageSize int, @totalRowsCount int, @firstRowNumber int, @firstRowID int
    set @pageNumber = 1
    set @pageSize = 50
    select @totalRowsCount = count(VID) from Border
    select @firstRowNumber = (@pageNumber - 1) * @pageSize + 1
    
    declare @g geography
    
    while (@firstRowNumber <= @totalRowsCount)--(@pageNumber < 20)
          
    begin
      set rowcount @firstRowNumber
      select @firstRowID = VID 
      from Border
      order by 1
    
      set rowcount @pageSize
      select GeoInfo 
      into #tGeo
      from Border
      where VID >= @firstRowID  
      
      if(@g is not null)
        select @g = @g.STUnion(dbo.GeoUnion(GeoInfo)) From #tGeo
      else
        select @g = dbo.GeoUnion(GeoInfo) From #tGeo
      
      set @PageNumber = @PageNumber + 1
      select @firstRowNumber = (@pageNumber - 1) * @pageSize + 1
      
      drop table #tGeo
      
    end
    set nocount off
    
    select @g
    
    set rowcount 0
    
    
    Any ideas to improve it a bit more ? :)

    here's perfomance 'tests':
    http://img97.imageshack.us/img97/9150/graph2.png


    Wednesday, September 23, 2009 12:13 PM
  • I had success doing this using a CLR aggregate function where I used a tiling scheme to make the running time reasonable.  The region encompassing the input polygons is tiled and a separate union formed for polygons in each tile.  Then the union of the unions is formed.  This resulted in a dramatically improved running time for my problem over the CLR aggregate in SqlSpatialTools.  I was able to union tens of thousands of polygons in a reasonable time, while the aggregate class in SqlSpatialTools seemed it would never finish.

    One problem with my code is that the tile size is hardcoded.  I picked a size that I thought would work well for the geography shapes I was unioning (0.1 degree), but for other problems a different tile size will be needed.  YMMV.

    Here's some code to start with.  Create a simple Tile struct like this:

    public struct Tile
    {
        public int lonTile;
        public int latTile;
        public Tile(int lonTile_, int latTile_)
        {
            lonTile = lonTile_;
            latTile = latTile_;
        }
    }
    Set a tile resolution and create a dictionary of tiles:
    private double tileResolution = 0.1;
    private Dictionary<Tile, SqlGeography> unions;
    

    Inside the Accumulate method you'll have some code like this:
    SqlGeography center = value.EnvelopeCenter();
    Tile tile = new Tile(
        (int)Math.Floor((double)center.Long / tileResolution),
        (int)Math.Floor((double)center.Lat / tileResolution));
    SqlGeography accum;
    if (unions.TryGetValue(tile, out accum) == true)
    {
        unions[tile] = accum.STUnion(value);
    }
    else
    {
        unions[tile] = value;
    }
    You'll need to union the unions in the Merge method, and also write some serialization code.
    Monday, September 28, 2009 6:04 PM
  • I solved my problem with SqlGeographyBuilder.

        public static SqlGeography Unite(this IEnumerable<SqlGeography> collection, int srid)
        {
          SqlGeographyBuilder builder = new SqlGeographyBuilder();
          builder.SetSrid(srid);
          builder.BeginGeography(OpenGisGeographyType.GeometryCollection);
          foreach (var current in collection)
          {
            switch (current.STGeometryType().Value)
            {
              case "Point":
                UniteWithPoint(ref builder, current);
                break;
              case "MultiPoint":
                UniteWithMultiPoint(ref builder, current);
                break;
              case "Polygon":
                UniteWithPolygon(ref builder, current);
                break;
              case "MultiPolygon":
                UniteWithMultiPolygon(ref builder, current);
                break;
              case "LineString":
                UniteWithLine(ref builder, current);
                break;
              case "MultiLineString":
                UniteWithMultiLineString(ref builder, current);
                break;
              case "GeometryCollection":
                // do smth smart
                throw new NotImplementedException();
                break;
              default:
                throw new InvalidOperationException("Unknown geography type: " + current.STGeometryType().Value);
                break;
            }
    
          }
          builder.EndGeography();
          return builder.ConstructedGeography;
        }


    and realization for concrete geometries like this:



        private static void UniteWithPolygon(ref SqlGeographyBuilder b, SqlGeography current)
        {
          b.BeginGeography(OpenGisGeographyType.Polygon);
          for (int r = 1; r < current.NumRings().Value +1; r++)
          {
            SqlGeography ring = current.RingN(r);
            b.BeginFigure(ring.STStartPoint().Lat.Value, ring.STStartPoint().Long.Value);
            for (int i = 2; i < ring.STNumPoints() + 1; i++)
              b.AddLine(ring.STPointN(i).Lat.Value, ring.STPointN(i).Long.Value);
            if (ring.STStartPoint().Lat.Value != ring.STEndPoint().Lat.Value ||
              ring.STStartPoint().Long.Value != ring.STEndPoint().Long.Value)
              b.AddLine(ring.STEndPoint().Lat.Value, ring.STEndPoint().Long.Value);
            b.EndFigure();
          }
          b.EndGeography();
        }  

    This method takes about 2 secs to complete for 40k objects (polygons in my case). This is quite well.

    Except one thing:

    declare @g1 geography = geography::STGeomFromText('POLYGON((59 81, 60 81, 60 82, 59 82, 59 81))', 4326)
    declare @g2 geography = geography::STGeomFromText('POLYGON((60 81, 61 81, 61 82, 60 82, 60 81))', 4326)
    
    select @g1.STUnion(@g2).STAsText()
    STUnion gives us 1 polygon from this code.
    My method returns GeoCollection. :(

    Tuesday, September 29, 2009 6:06 AM
  • This method takes about 2 secs to complete for 40k objects (polygons in my case). This is quite well.

    Except one thing:

    declare @g1 geography = geography::STGeomFromText('POLYGON((59 81, 60 81, 60 82, 59 82, 59 81))', 4326)
    declare @g2 geography = geography::STGeomFromText('POLYGON((60 81, 61 81, 61 82, 60 82, 60 81))', 4326)
    
    select @g1.STUnion(@g2).STAsText()
    
    STUnion gives us 1 polygon from this code.
    My method returns GeoCollection. :(

    Well I solved this too by using STBuffer(x)  method (or BufferWithTolerance).
    'x' depends on needed presence.

    So If you need 1 polygon instead of geometry collection, you have to just use this method.

    BTW it also helps to remove small 'holes' which has appeares after using STUnion() method and looks like thin lines between 2 geography objects.


    Now, It may be marked as answered topic :)

    Monday, October 5, 2009 7:04 AM
  • I know this thread has been idle for some time, but I wanted to offer a T-SQL suggestion for spatial aggregating.  Like Eugene, I was having problems with iterative, T-SQL approaches using STUnion() being slow and leaving the final polygon(s) with gaps.  Since geometries can be displayed as WKT, I decided to try addressing spatial aggregates as a string concatenation issue.  An example of the code I use is below (I renamed the column and table names to try to make them somewhat descriptive).

    WITH AggrColumnCTE (GroupByColumn)
    AS
    (
    	SELECT DISTINCT GroupByColumn
    	FROM dbo.table
    )
    , AggrShapeCTE (GroupByColumn, GeometryCollection) 
    AS 
    (
    	SELECT GroupByColumn
    		  ,GeometryCollection = geometry::STGeomCollFromText('GEOMETRYCOLLECTION (' + LEFT(GeomCollWKT,
    													       DATALENGTH(GeomCollWKT)/2 - 2)+ ')',1)
    	FROM AggrColumnCTE ac CROSS APPLY
             (SELECT GeometryColumn.STAsText() + ', '
    			FROM dbo.table AS t
               WHERE t.GroupByColumn = ac.GroupByColumn
    			 FOR XML PATH('')) AS ca(GeomCollWKT)
    )
        
    SELECT GroupByColumn
           ,GeometryCollection
           ,GeometryCollection.STUnion(GeometryCollection.STPointN(1)) as GeometryUnion 
    FROM AggrShapeCTE

    In general, I have had much better performance using the T-SQL approach above than using the aggregates posted at SQL Server Spatial Tools.  As compared to other T-SQL approaches I have tried, the above approach does not generate as many gaps either (I try to avoid buffering to remove gaps).

    Eugene, I would be interested to hear how the approach above compares, in terms of performance and results, to other approaches you have tried.
    Friday, December 4, 2009 12:13 AM
  • Hello.

    Yeah, Its quite well solution. 10K linestrings takes about 5 seconds to 'unite', instead of several minutes with STUnion method.
    But it returns a geography collection object also (look at my post about uniting of 2 polygons and using STBuffer method).

    P.S. : And its slower than SqlGeographyBuilder ;)
    Thursday, December 10, 2009 2:15 PM
  • "But it returns a geography collection object also..."  Did you mean geometry collection object?  If one omits the GeometryCollection column from the last select statement, it will speed up the query by a noticeable amount.  I usually do not return the geometry collection, I included it in the original code for illustrative purposes.

    Thanks for the information regarding performance on your data compared to SqlGeographyBuilder.  The only experience I have with SqlGeographyBuilder or SqlGeometryBuilder is through the SQL Server Spatial Tools on CodePlex.  For the datasets I have worked with, the string concatenation approach has been much faster than using those tools.  I haven't had a chance to try the code you posted using SqlGeographyBuilder.

    Let's hope the performance of spatial aggregates in the next release of SQL Server is better than all of the approaches posted here.
    Thursday, December 10, 2009 9:38 PM