Two indexing questions: empty cells and "goodness of fit"

# Two indexing questions: empty cells and "goodness of fit"

• Saturday, April 26, 2008 11:34 AM

The other day I started thinking about what bounding box should I use to spatially index the United States. A simple but maybe not so great answer is xmin=-180, ymax=90, xmax=0, ymin=0. This does not seem like such a good answer because it covers a lot of ocean space (the Arctic and part of the eastern Atlantic) that probably will not contain features that we are interested in querying. It will also contain part of Africa and Europe that are out of scope. It also doesn't cover all of the "United States" since there are populated places in American Samoa well south of the Equator and places in the Marianas and Guam are well "west" of the anti-prime meridian. We can solve the first part of that problem by having "tighter fitting" bounding box but there's little to be done about the "outlying" areas.

Or will it?

The documentation implies that if a cell doesn't contain any features it would not appear in the index. The reason for this being that tessellation is done per object, not per cell. So if no object touches a given cell, that cell doesn't occur in the index.

If that's correct then only real "evil" of using a loose-fitting bounding box is that a given cell's area may will be greater and thus may be more likely to touch an object and therefore may increase the size of the size of the index more than need be.

So the questions are:

a.       Is there such a thing an empty cell in the index?

b.      Is the "tight-fitting" versus "loose-fitting" area problem really worth fretting about?

### All Replies

• Monday, April 28, 2008 6:02 PM
Moderator

A. Nope, it's a virtual index so no cell entries are created for empty grid cells.

B.  Potentially yes.  This can definitely have an impact if you have a large index area, which means that each cell will be less granular with a non-uniform distribution of objects.  When querying in dense places (Manhattan) vs. non-dense places (Alaska) you may not want to use the same grid size and density.  One way to solve this is to create additional selective indexes for especially important, dense locations.  However, the downside to this is that QO will not pick the optimal index for you, so you will need to hint the queries that you want to use the specialized selective indexes.  Whether this extra effort is worth it really depends on your application.

• Monday, April 28, 2008 6:31 PM

A. Okay.

B. So this begs the questions:

B1. Since we can't specify a bounding box for Geography instances, its it reasonable to assume to "visualization" purposes that its -180.-90,180,90? I use "visualization" in this sense to mean an overlay of a grid so you can see what objects which touch a given cell.

Its pretty easy, for example to mentally map level 1 of a 4x4 grid maps over a globe. Level 2 might be too, but beyond that... So far, the best the best I've been able to generate is level 2 of an 8x8 grid.

This kind of activity is important so you get a sense of density of features.

B2. If not, is the geography grid implied using a MBR around all the indexed instances? Or is something else going on?

More later, I suspect.

• Monday, April 28, 2008 8:35 PM
Moderator

B1. Mapping the geography index to a simple grid is not quite that straightforward.  In BOL, there is a great picture of how the two hemipsheres of the earth get mapped to the actual grid used by the index under Projection of the Geodetic Space onto a Plane:

http://technet.microsoft.com/en-us/library/bb964712(SQL.100).aspx

• Monday, April 28, 2008 9:34 PM

Hi Steven,

Yes, I'm getting very familar with the referenced BOL, but it leaves a lot to be desired in terms of specifics. For example, step three of the illustration actually shows 32 sections, 16 north, 16 south. I don't believe that's accurate for a level-1 low density index. The flattened hemisphere graphics do appear correct by showing 16 tessellation sections, but I wanted to nail this down.(it helps to see this when you have a beginner's mind)/

So it appears that the answer for B1 is that any level of any density grid covers the whole surface, so B2 doesn't apply.

kt

• Wednesday, April 30, 2008 12:24 AM
Moderator

Hi Kent,

I'm a little confused about what you're asking, so let me know if I hit it or not.

If you're asking what a tessellation looks like when it maps back to the sphere, it's a little complicated.  I need to work up a better visualization of this.  One way to think of it, although not particularly accurate, is to think of taking a square sheet of latex, drawing a square grid on it, and then stretching that over the globe so it exactly covers a hemisphere.  Now do it again for the other hemisphere.

The illustration is a little confusing, altbeit accurate for what it actually shows.  It's not showing 16 pie-shaped grid cells, rather a set of longitude lines.  These are not the grid cells.

Cheers.
-Isaac
• Wednesday, April 30, 2008 3:15 AM

Hi Isaac,

Yes, I have a pretty good idea what the index looks like -- I have used SQL Server to generate some KML files for it although they are the reverse of your model -- from "wrapped" to "flat" grid. What I was fundamentally getting at was actually if it covered both hemispheres.

Happy to share what I have if you'd like them.

kt

• Wednesday, April 30, 2008 3:38 PM
Moderator

Hi Kent,

Gotcha---it does cover both hemispheres, but in two pieces.  I'd love to see what you have---do you have my email?

-Isaac