0
votes

I have a large table (~1.5m rows) with spatial location (X and Y and a geometry field). I need to create a smoothed version of one of the values in this table by averaging with nearby points. I want the influence of each point to be the inverse of the square of this distance (this is sometimes called gravity weighting).

The problem is that the query is incredibly slow and I have never got it to finish even after >50 hours running. Does anyone have any suggestions on how I could improve the performance please? I've tried with a subset of the data (1000 rows from the table) and it calculates in <10 seconds.

The geometry field has a spatial index and the query stats show that it is being used.

 SELECT t1.[X]
      ,t1.[Y]
      ,sum(t2.myvar*t2.persons/(power((t1.eastings-t2.eastings)/100+1,2)+power((t1.northings-t2.northings)/100+1,2)+1)) as wtXfit
      ,sum(t2.persons/(power((t1.eastings-t2.eastings)/100+1,2)+power((t1.northings-t2.northings)/100+1,2)+1)) as wt
  FROM [Faststore].[dbo].[mytable] t1
  full outer join [Faststore].[dbo].[mytable] t2 on t1.geom.STDistance(t2.geom)<=500
  group by t1.[X]
      ,t1.[Y]

Sample of the data:

sample data

I have a primary key on the column ID and geometry auto grid index on GEOM.

I've also tried using the X and Y columns <500 instead (with an index on those) but get similarly poor performance. The main problem as AlwaysLearning commented is that it is trying to do 1.5m*1.5 calculations rather than just using the nearby points.

1
1st: Can you add sample data please? 2nd: Can you add your table infomations (pk, fk, indexes)? 3rd: What did you try so far? Did you add for example Indexes on the columns? 4th: Is it really necessary to calculate this in your database? - droebi
STDistance() includes a not-inexpensive sqrt() operation in its calculations and you're performing 2,250,000,000,000 STDistance() operations in your join (i.e.: 1.5M * 1.5M rows). Don't do that. Consider alternatives that don't involve STDistance(), how about a lat/long difference < 0.005°? - AlwaysLearning
droebi: I've edited my post to add that information. Thanks - RobShaw_UK
One trick I'd try to amortize the cost of finding the nearest neighbors is to create a new column that is defined as geom.STBuffer(500) and then querying for neighbors with geom.STIntersects(buffer) = 1 or similar. - Ben Thul

1 Answers

0
votes

I found a manageable solution to this in the end. Firstly I created a temporary table without any calculations (so the X, Y, persons and myvar from each table) where X and Y were within 500. This took about 90 minutes to run and gave 65m rows. I then added columns to this to calculate the distance decay and then created an aggregate in a new table. This took another hour.

Thanks for your input everyone.