Penanya
Find Closest point to a point

Hi Everyone
I am super new to the spatial side of things with sql. I am using SQL Server 2008 Express. I have a table with town points with coordinates and a geog column. Then i have another table with 400 000 odd points. I have to find and populate say Closest_Town column with the closest town point for each of these points. At the moment i am using a cursor for this (yes i know this is prob completely wrong, but thats why i need help!):
DECLARE @tmpId int
DECLARE point CURSOR FORSELECT ID
FROM tblPoints
OPEN pointFETCH point INTO @tmpId
WHILE @@Fetch_Status = 0
BEGIN
declare @iTown varchar(255)
select top 1 @iTown = b.TWN_NAME
FROM tblPoints A,
SP_TOWN_POINTS B
where a.ID = @tmpId
order by b.geog.STDistance(A.GEOG)update tblPoints
set town = @iTown
where ID = @tmpIdFETCH point INTO @tmpId
END
CLOSE point
DEALLOCATE point
RETURNAt the moment this is taking about 16 hours for 40 000 records which is just crazy. Can anyone help me to write an easy quicker query?
 Dipindahkan oleh Naomi N 06 Februari 2012 16:13 May be better answer here (From:TransactSQL)
Pertanyaan
Semua Balasan


You can simply run a query with CROSS APPLY, e.g.
select T.*, Closest.* from tblPoints T CROSS APPLY (select top (1) * from SP_Town_Points b order by b.geog.STDDistance(T.Geog)) Closest
For every expert, there is an equal and opposite expert.  Becker's Law
My blog 
Hi Everyone
I am super new to the spatial side of things with sql. I am using SQL Server 2008 Express. I have a table with town points with coordinates and a geog column. Then i have another table with 400 000 odd points. I have to find and populate say Closest_Town column with the closest town point for each of these points. At the moment i am using a cursor for this (yes i know this is prob completely wrong, but thats why i need help!):
DECLARE @tmpId int
DECLARE point CURSOR FORSELECT ID
FROM tblPoints
OPEN pointFETCH point INTO @tmpId
WHILE @@Fetch_Status = 0
BEGIN
declare @iTown varchar(255)
select top 1 @iTown = b.TWN_NAME
FROM tblPoints A,
SP_TOWN_POINTS B
where a.ID = @tmpId
order by b.geog.STDistance(A.GEOG)update tblPoints
set town = @iTown
where ID = @tmpIdFETCH point INTO @tmpId
END
CLOSE point
DEALLOCATE point
RETURNAt the moment this is taking about 16 hours for 40 000 records which is just crazy. Can anyone help me to write an easy quicker query?
 Digabungkan oleh Stephanie Lv 09 Februari 2012 11:16




Hi Naomi
I've tried ur suggestion but it takes about 30min for 900 records, which also seems a bit way to slow? as I've got around 300 000 records that i need to populate...
Should i maybe create my spatial index differently?
this is the code i use for it:
CREATE SPATIAL INDEX SP_INDEX_POINT_GEOG
ON tblPoints(GEOG);


I am not sure if an efficient way exists for your problem. We need to find the closest town for 300000 towns.
I assume it means we need to make almost 300K power 2 calculations. We can cut the number in 2 (since A and B will be the same as B and A). Still it's a lot.
I think we first need to decide if a better algorithm exists here and then follow this idea.
For every expert, there is an equal and opposite expert.  Becker's Law
My blog 
Are your town points and your another 400 000 odd points realy geo points, kind of
geography::STGeomFromText('POINT(122.34900 47.65100)', 4326) ?In that case you may first try greatly narrow number of towns to search using point.lat, point.lon and some unpresize but quick distance calculation. See http://en.wikipedia.org/wiki/Geographical_distance for example.
Serg 
Yes, that's what I was thinking also  we need to build a bounding box first and filter lots of towns.
See the ideas in this blog post
SQL Server 2008 Proximity Search With The Geography Data Type
and the blog referenced at the top of this one.
For every expert, there is an equal and opposite expert.  Becker's Law
My blog 
As Naomi has already commented, whatever way you structure your query (using a CROSS APPLY, a cursor, or a subSELECT, for example), you can't avoid the fact that you're going to end up performing the nearestneighbor part of your query 400,000 times  once for each row of your base table.
So, forget about the exact syntax of UPDATE query for now, and instead concentrate on just getting that part to execute as efficiently as possible. At the moment, you're doing this:
declare @iTown varchar(255) select top 1 @iTown = b.TWN_NAME FROM tblPoints A, SP_TOWN_POINTS B where a.ID = @tmpId order by b.geog.STDistance(A.GEOG)
This is not efficient, because what you're doing is calculating the distance between the current point and every town in the SP_TOWN_POINTS table, then sorting the whole of that list, and finally discarding all but the TOP 1 result. Since you're not applying any selectivity in this query, it can't make use of a spatial (or any other kind of) index  it'll perform the relatively expensive STDistance() method for every row in the Town table. And then, this query gets repeated over and over for every row in the tblPoints table...
This type of query is generally referred to as a "nearestneighbour" query (or, if you're in the US, a "nearest neighbor" query :), and it's been discussed several times on this forum already. Here's some starting points that might help you:
 Try limiting the search for nearest neighbors within a bounded search area, or an expanding area as described by Isaac Kunen, here: http://blogs.msdn.com/b/isaac/archive/2008/10/23/nearestneighbors.aspx
 Try out the RC0 of SQL Server 2012, which includes a dedicated query plan for nearest neighbor queries as described here: http://download.microsoft.com/download/D/2/0/D20E1C5F72EA45059F26FEF9550EFD44/SQLServer_Denali_Spatial.docx
 If you're only using tables containing points, then you might not need to use geometry/geography at all. The spatial functionality in SQL Server 2008/R2/2012 is ideally suited for dealing with complex geometry operations, such as testing for intersection between abstract geometrical shapes  polygons/curves/linestrings etc. The spatial methods are accurate, but they're not fast. If all you want to do is find approximate distances between points, you will almost certainly find that storing latitude/longitude coordinates as decimal(18,9) columns, and calculating the distance between them using something like the Haversine approximation will be much fast than any of the other methods described above.
Once you've got an efficient query to select the closest town to any point, then you can plug that into your UPDATE query.
twitter: @alastaira blog: http://alastaira.wordpress.com/
 Disarankan sebagai Jawaban oleh Naomi N 08 Februari 2012 13:57

I've just posted a blog post that you might find helpful.
twitter: @alastaira blog: http://alastaira.wordpress.com/
 Disarankan sebagai Jawaban oleh Naomi N 23 Februari 2012 1:10