locked
Splitting/Tiling large numbers of Geography objects RRS feed

  • Question

  • We have a mapping application which is required to show shapes covering many areas worldwide - districts, in some countries, boroughs in others, etc.  Entire countries can be covered.

    We can't show all shapes at one time, for obvious performance reasons, so we have implemented what should be a nice tiling system, utilising the benefits of http caching.  Shapes are received in ESRI shapefiles and are imported using a batch script which runs Shape2Sql.exe and some SQL scripts.  We then have a web interface which allows us to view the current set of 'tiles' or 'ShapeGroups'.  It offers the ability to split each of these into smaller tiles.  It's this last piece which we think is causing us problems, as we appear to be missing some shapes.

    Out splitting function is in C#:

    public IEnumerable<Viewport> SplitGrid(int times)
            {
                if (times == 1)
                {
                    var medianX = MinX + ((MaxX - MinX) / 2D);
                    var medianY = MinY + ((MaxY - MinY) / 2D);
                    return new[]
                    {
                        new Viewport(MinX, MinY, medianX, medianY, Resolution), 
                        new Viewport(MinX, medianY, medianX, MaxY, Resolution), 
                        new Viewport(medianX, medianY, MaxX, MaxY, Resolution), 
                        new Viewport(medianX, MinY, MaxX, medianY, Resolution)
                    };
                }
    
                return SplitGrid(1).SelectMany(_ => _.SplitGrid(times - 1));
            }

    So if a tile has a lot of shapes, we'll call SplitGrid passing the Id of that tile and an int saying how many times do you want to recursively loop and keep splitting.  So SplitGrid(<id>, 1) would result in 4 tiles, SplitGrid(<id>, 2) will result in 16, etc.

    What we appear to be seeing is gaps appearing - and we can only surmise that it's because we need to take into account geodesic/cartesian issues?  Can anyone enlighten us and explain, and also if anyone knows the correct way to do this we'd really appreciate a suggested solution.

    Thanks,
    Monty


    • Edited by MontyBurns Tuesday, March 13, 2012 5:34 PM
    Tuesday, March 13, 2012 5:21 PM

Answers

  • It sounds like you're trying to apply two-dimensional planar logic on a three-dimensinonal curved surface, which won't work.

    Consider a zone whose upper boundary is a line drawn between two points on the 49th parallel. Your splitting function is splitting that zone based on the midpoint of a straight line connecting those points, as follows:

    But, since you're working with geodesic coordinates, the true midpoint between those points lies on an elliptic arc between them, like this:

    Over short distances, you won't much difference between the two but, as the distance between your start and end points increases (i.e. your bigger grid cells), the amount of "distortion" between the results you're using and the true boundaries of those areas increases as well. If you want to use any logic that uses grids/straight lines etc. you really need to be working with projected geometry data, not geodesic coordinates.


    twitter: @alastaira blog: http://alastaira.wordpress.com/

    • Marked as answer by MontyBurns Thursday, March 15, 2012 11:27 AM
    Wednesday, March 14, 2012 8:18 AM
    Answerer

All replies

  • Hi,

    I can't quite follow the exact process you describe but, from what I understand, it's most likely (as you say) related to "geodesic/cartesian issues".

    Your source shapefiles are provided in a certain coordinate reference system, as described in the .PRJ file that (should) accompany them. Is that the same system as that which you store them in SQL Server and, ultimately, the same system in which you display them?

    If not, then at what stage are you performing the transformation/reprojection? You need to perform the "splitting" function on data that's projected in the same system as that in which it's being displayed, otherwise you'll risk getting gaps or overlaps.


    twitter: @alastaira blog: http://alastaira.wordpress.com/

    Tuesday, March 13, 2012 6:32 PM
    Answerer
  • The shape file is stored as a geography. Many shapes go into one tile (which is what is modelled in that code). Each tile has boundaries in geodesic coordinates, so we store minx / maxx etc as a float in degrees (unprojected).

    The process then splits each of those tiles in two, and does intersection testing for each of the shapes in a tile to see if they fit. We do this by taking the geodesic degrees, by splitting it into 4 tiles, and then doing hit testing on each shape to see which of the new subtiles it is part of (that part works fine).

    What is happening is that when we split a large tile into a lot of smaller ones, we'd expect the new boundaries to follow the curve of the original tile when projected, but they don't.

    In other words, the question is, given the four angles of a zone given in geodesic, is dividing by two in the middle of those values to generate new geodesic values the correct way. The algorithm seems to be instead generating new zones that are not projecting correctly. Aka given two bounary points for a box (0d,0d) and (2d,2d) is it correct to assume that the simple division of the float value by two will be equivalent to having (0d,0d)-(1d,1d) etc, or are the calculations to be done differently? 


    http://serialseb.blogspot.com

    Wednesday, March 14, 2012 4:07 AM
  • It sounds like you're trying to apply two-dimensional planar logic on a three-dimensinonal curved surface, which won't work.

    Consider a zone whose upper boundary is a line drawn between two points on the 49th parallel. Your splitting function is splitting that zone based on the midpoint of a straight line connecting those points, as follows:

    But, since you're working with geodesic coordinates, the true midpoint between those points lies on an elliptic arc between them, like this:

    Over short distances, you won't much difference between the two but, as the distance between your start and end points increases (i.e. your bigger grid cells), the amount of "distortion" between the results you're using and the true boundaries of those areas increases as well. If you want to use any logic that uses grids/straight lines etc. you really need to be working with projected geometry data, not geodesic coordinates.


    twitter: @alastaira blog: http://alastaira.wordpress.com/

    • Marked as answer by MontyBurns Thursday, March 15, 2012 11:27 AM
    Wednesday, March 14, 2012 8:18 AM
    Answerer
  • many thanks for the excellent explanation, tanoshimi - much appreciated.

    So if we change our function to find midpoints using the example method below, do you see our tiling system working?  Presumably vertical/north-south lines follow similar principles?  I ask as I'm aware lines of longitude and latitude are quite different, with longitude running through the poles and being further apart towards the equator.

    http://www.geomidpoint.com/example.html

    Anything in Microsoft.SqlServer.Types.dll which may help? The Spatial Tools on Codeplex?  As all data is in WGS 84/SRID 4326, I assume we'd be best using a function which takes the oblateness of the earth into account?

    Thanks,
    Monty

    Wednesday, March 14, 2012 10:32 AM