0
votes

We are trying to find a way to use the haversine() function to map some locations to some data. The concept is we have some locations for stores and then we have data for various cities. We want to identify the closest city to each store for which we have data, and then merge the data for that city with the data for the store. Example below. Obviously we could write a python script to find the closest city and add that as a column to one table, but we were hoping to accomplish this with a query/view so that when adding new stores/cities we don’t need to re-run the mapping script. The only thing I could think of to do this would be a correlated subquery within a column which I don’t believe Snowflake supports. Is there any other way to do this?

Thanks, J

Stores Table:

City          State               Store_Num        Lat                     Lon
Buckhead      GA                     1             33.8399734          -84.4701434
Villanova     PA                     2             40.0369415          -75.365584
Boulder       CO                     3             40.0294202          -105.3101889



Data Table:

Date            Value                    City                        Lat                          Lon
1/1/20          10                      Atlanta                33.7678357          -84.4908155
1/2/20          15                      Atlanta                33.7678357          -84.4908155        
1/3/20          13                      Atlanta                33.7678357          -84.4908155
1/1/20          11                      Denver                 39.7645183          -104.9955382
1/2/20          12                      Denver                 39.7645183          -104.9955382
1/3/20          14                      Denver                 39.7645183          -104.9955382
1/1/20          20                      Philadelphia           40.0026763          -75.258455          
1/2/20          25                      Philadelphia           40.0026763          -75.258455                          
1/3/20          22                      Philadelphia           40.0026763          -75.258455          
1/1/20          5                       Atlantic City          39.376672            -74.4879282        
1/2/20          7                       Atlantic City          39.376672            -74.4879282        
1/3/20          10                      Atlantic City          39.376672            -74.4879282


Desired Outcome:

Date          Store_Num        Data_City            Data Value         Data_Distance
1/1/20        1                Atlanta                 10               8,248
1/2/20        1                Atlanta                 15               8,248     
1/3/20        1                Atlanta                 13               8,248
1/1/20        3                Denver                  11               39,864
1/2/20        3                Denver                  12               39,864
1/3/20        3                Denver                  14               39,864
1/1/20        2                Philadelphia            20               9,889
1/2/20        2                Philadelphia            25               9,889     
1/3/20        2                Philadelphia            22               9,889

1

1 Answers

3
votes

I don't know where Atlantic City has gone on your output, but if you have a small data, you can use the following query:

WITH stores (City,State,Store_Num,Lat,Lon) AS (
SELECT * FROM VALUES
('Buckhead','GA',1,33.8399734,-84.4701434),
('Villanova','PA',2,40.0369415,-75.365584),
('Boulder','CO',3,40.0294202,-105.3101889)
)
, data_table (Date,Value,City,Lat,Lon)
AS (
SELECT * FROM VALUES
('1/1/20',10,'Atlanta',33.7678357,-84.4908155),
('1/2/20',15,'Atlanta',33.7678357,-84.4908155),      
('1/3/20',13,'Atlanta',33.7678357,-84.4908155),
('1/1/20',11,'Denver',39.7645183,-104.9955382),
('1/2/20',12,'Denver',39.7645183,-104.9955382),
('1/3/20',14,'Denver',39.7645183,-104.9955382),
('1/1/20',20,'Philadelphia',40.0026763,-75.258455),          
('1/2/20',25,'Philadelphia',40.0026763,-75.258455),                          
('1/3/20',22,'Philadelphia',40.0026763,-75.258455),         
('1/1/20',5,'Atlantic City',39.376672,-74.4879282),        
('1/2/20',7,'Atlantic City',39.376672,-74.4879282),        
('1/3/20',10,'Atlantic City',39.376672,-74.4879282)
)
SELECT d.date, s.Store_Num,d.City, d.value, 
haversine( s.lat, s.lon, d.lat, d.lon) distance
FROM data_table d, stores s
qualify row_number() over (partition by  d.city, d.date order by haversine( s.lat, s.lon, d.lat, d.lon)  ) = 1;

About QUALIFY check the following doc:

https://docs.snowflake.com/en/sql-reference/constructs/qualify.html

The output is:

+--------+-----------+---------------+-------+---------------+
| DATE   | STORE_NUM | CITY          | VALUE | DISTANCE      |
+--------+-----------+---------------+-------+---------------+
| 1/1/20 | 1         | Atlanta       | 10    | 8.245620101   |
| 1/2/20 | 1         | Atlanta       | 15    | 8.245620101   |
| 1/3/20 | 1         | Atlanta       | 13    | 8.245620101   |
| 1/1/20 | 2         | Atlantic City | 5     | 105.009087658 |
| 1/2/20 | 2         | Atlantic City | 7     | 105.009087658 |
| 1/3/20 | 2         | Atlantic City | 10    | 105.009087658 |
| 1/1/20 | 3         | Denver        | 11    | 39.851626235  |
| 1/2/20 | 3         | Denver        | 12    | 39.851626235  |
| 1/3/20 | 3         | Denver        | 14    | 39.851626235  |
| 1/1/20 | 2         | Philadelphia  | 20    | 9.886319193   |
| 1/2/20 | 2         | Philadelphia  | 25    | 9.886319193   |
| 1/3/20 | 2         | Philadelphia  | 22    | 9.886319193   |
+--------+-----------+---------------+-------+---------------+