locked
MultiPolygons and indexing RRS feed

  • Question

  • I am a bit confused as to how the "Cells Per Object" setting would work for a spatial index on a column with lots of complex multipolygons. 

    For example, if the cells per object is set to 16 and a multipolygon is found does the indexing process run thru the individual polygon elements sequentially until the cells per object limit is reached and then not do anything with the remaining polygon elements?  Or maybe does it do something with the envelope that would contain all the polygons?  Something else?  Revert to table scan?

    One real world example of this is the CRESTA zone #2 for Denmark, which includes the Faroe Islands as well as an area on the mainland.  There are 36 polygons in this multipolygon.  The Faroe Islands are roughly 650 miles offshore from Denmark.

    Splitting the multipolygons into rows of single polygons might be an option, but brings additional challenges.  Before starting down that road, it would be good to understand how the "plumbing" of the index stuff will react.

    Tuesday, July 28, 2009 1:33 PM

Answers

  • If you set cells per object to 16, it means that each row in the base table should contain a maximum of 16 index rows. Whether  a specific row contains a polygon or multipolygon does not matter, it will just use bigger cells up to the size of a first-level cell.
     
    But, as books online says here: http://msd n.microsoft.com/en-us/library/bb934196.aspx, "At the top level, if an object covers more cells than specified by n, the indexing uses as many cells as necessary to provide a complete top-level tessellation. In such cases, an object might receive more than the specified number of cells. In this case, the maximum number is the number of cells generated by the top-level grid, which depends on the density."
     
    You can check your work with different densities by using the spatial index analysis procedures.
     
    Cheers,
    Bob Beauchemin
    SQLskills
    "wimpy1" wrote in message news:d4138df4-336d-41f e-ab66-39931313d76d...

    I am a bit confused as to how the "Cells Per Object" setting would work for a spatial index on a column with lots of complex multipolygons. 

    For example, if the cells per object is set to 16 and a multipolygon is found does the indexing process run thru the individual polygon elements sequentially until the cells per object limit is reached and then not do anything with the remaining polygon elements?  Or maybe does it do something with the envelope that would contain all the polygons?  Something else?  Revert to table scan?

    One real world example of this is the CRESTA zone #2 for Denmark, which includes the Faroe Islands as well as an area on the mainland.  There are 36 polygons in this multipolygon.  The Faroe Islands are roughly 650 miles offshore from Denmark.

    Splitting the multipolygons into rows of single polygons might be an option, but brings additional challenges.  Before starting down that road, it would be good to understand how the "plumbing" of the index stuff will react.

    Tuesday, July 28, 2009 3:10 PM
  • The indexing process works for multixxx geometries in exactly the same way it does for single-element points, linestrings or polygons.
     - Space is decomposed into four levels of nested grids, according to the parameters defined when the spatial index was created.
     - To begin, the object is tiled by level 1 cells, using those tiles from the index that are fully-covered, partially-covered, or touched by the geometry. For a multi-geometry instance, this means that the cells must tile every individual geometry within the multigeometry instance.
     - Assuming that the CELLS_PER_OBJECT limit has not been reached, each level 1 cell is then subdivided into level 2 cells, according to the deepest-cell rule (i.e. if cell 1.4.2 is intersected by the geometry, then we know that at least some part of 1.4 must also be intersected by the geometry) and the covering-rule (i.e. if all subcells of a given cell are covered by a geometry, then there's no point subdividing the cell).
     - This subdivision of each grid level continues until level 4, or until the CELLS_PER_OBJECT limit is reached.

    The subdivision process is determined by a breadth-first walk through the cells at each level, NOT based on what polygon of the multipolygon instance is contained in the cell.

    Choosing to store each element of a multipolygon as a separate row will obviously affect the way in which they are indexed, but may not necessarily result in better performance. If there are good business reasons for treating the Faeroe Islands separately from mainland Denmark then by all means do so, but I personally would not attempt to change the underlying data structure purely to try to alter the spatial index.
    Beginning Spatial with SQL Server http://www.apress.com/book/view/1430218290
    Tuesday, July 28, 2009 3:19 PM
    Answerer

All replies

  • If you set cells per object to 16, it means that each row in the base table should contain a maximum of 16 index rows. Whether  a specific row contains a polygon or multipolygon does not matter, it will just use bigger cells up to the size of a first-level cell.
     
    But, as books online says here: http://msd n.microsoft.com/en-us/library/bb934196.aspx, "At the top level, if an object covers more cells than specified by n, the indexing uses as many cells as necessary to provide a complete top-level tessellation. In such cases, an object might receive more than the specified number of cells. In this case, the maximum number is the number of cells generated by the top-level grid, which depends on the density."
     
    You can check your work with different densities by using the spatial index analysis procedures.
     
    Cheers,
    Bob Beauchemin
    SQLskills
    "wimpy1" wrote in message news:d4138df4-336d-41f e-ab66-39931313d76d...

    I am a bit confused as to how the "Cells Per Object" setting would work for a spatial index on a column with lots of complex multipolygons. 

    For example, if the cells per object is set to 16 and a multipolygon is found does the indexing process run thru the individual polygon elements sequentially until the cells per object limit is reached and then not do anything with the remaining polygon elements?  Or maybe does it do something with the envelope that would contain all the polygons?  Something else?  Revert to table scan?

    One real world example of this is the CRESTA zone #2 for Denmark, which includes the Faroe Islands as well as an area on the mainland.  There are 36 polygons in this multipolygon.  The Faroe Islands are roughly 650 miles offshore from Denmark.

    Splitting the multipolygons into rows of single polygons might be an option, but brings additional challenges.  Before starting down that road, it would be good to understand how the "plumbing" of the index stuff will react.

    Tuesday, July 28, 2009 3:10 PM
  • The indexing process works for multixxx geometries in exactly the same way it does for single-element points, linestrings or polygons.
     - Space is decomposed into four levels of nested grids, according to the parameters defined when the spatial index was created.
     - To begin, the object is tiled by level 1 cells, using those tiles from the index that are fully-covered, partially-covered, or touched by the geometry. For a multi-geometry instance, this means that the cells must tile every individual geometry within the multigeometry instance.
     - Assuming that the CELLS_PER_OBJECT limit has not been reached, each level 1 cell is then subdivided into level 2 cells, according to the deepest-cell rule (i.e. if cell 1.4.2 is intersected by the geometry, then we know that at least some part of 1.4 must also be intersected by the geometry) and the covering-rule (i.e. if all subcells of a given cell are covered by a geometry, then there's no point subdividing the cell).
     - This subdivision of each grid level continues until level 4, or until the CELLS_PER_OBJECT limit is reached.

    The subdivision process is determined by a breadth-first walk through the cells at each level, NOT based on what polygon of the multipolygon instance is contained in the cell.

    Choosing to store each element of a multipolygon as a separate row will obviously affect the way in which they are indexed, but may not necessarily result in better performance. If there are good business reasons for treating the Faeroe Islands separately from mainland Denmark then by all means do so, but I personally would not attempt to change the underlying data structure purely to try to alter the spatial index.
    Beginning Spatial with SQL Server http://www.apress.com/book/view/1430218290
    Tuesday, July 28, 2009 3:19 PM
    Answerer