1
votes

I am new to Geopy. I am working in this transportation company and need to get the total kilometers that a truck has operated.

I have seen some answers here but they did not work for me.

I have the following Dataframe from a GPS installed on the truck

    latitude    longitude
0   -25.145439  -54.294871
1   -24.144564  -54.240094
2   -24.142564  -54.198901
3   -24.140093  52.119021

The first step is making a third column tranforming everything to a point but all of my attempts failed

I write

df['point'] = df['latitude'].astype(float),df['longitude'].astype(float)

It returns an object. I would like it to return a point. My obective is to have:

    latitude    longitude      Point
0   -25.145439  -54.294871     (-25.145439  -54.294871)
1   -24.144564  -54.240094     (-24.144564  -54.240094)
2   -24.142564  -54.198901     (-24.142564  -54.198901)
3   -24.140093  52.119021      (-24.140093  52.119021)

Then I would like to make the distance from these two so I would have something like this:

    latitude    longitude      Point                        Distance KM
0   -25.145439  -54.294871     (-25.145439  -54.294871)     0
1   -24.144564  -54.240094     (-24.144564  -54.240094)     0,2
2   -24.142564  -54.198901     (-24.142564  -54.198901)     0,4
3   -24.140093  52.119021      (-24.140093  52.119021)      0,2

Note the the distance is the difference from the row above (it is already in order)

I am trying:

df['distance'] = geodesic(df['point'],df['point'].shift(1))

And I am getting an error that it does not work with tupple.

Anyone knows a solution for this?

tks

2
Got it. Thanks. It is in this post: stackoverflow.com/questions/30969282/…user1922364

2 Answers

2
votes

Create a point Series:

import pandas as pd

df = pd.DataFrame(
    [
        (-25.145439,  -54.294871),
        (-24.144564,  -54.240094),
        (-24.142564,  -54.198901),
        (-24.140093,  52.119021),
    ],
    columns=['latitude', 'longitude']
)

from geopy import Point
from geopy.distance import distance

df['point'] = df.apply(lambda row: Point(latitude=row['latitude'], longitude=row['longitude']), axis=1)
In [2]: df
Out[2]:
    latitude  longitude                                point
0 -25.145439 -54.294871  25 8m 43.5804s S, 54 17m 41.5356s W
1 -24.144564 -54.240094  24 8m 40.4304s S, 54 14m 24.3384s W
2 -24.142564 -54.198901  24 8m 33.2304s S, 54 11m 56.0436s W
3 -24.140093  52.119021    24 8m 24.3348s S, 52 7m 8.4756s E

Add a new shifted point_next Series:

df['point_next'] = df['point'].shift(1)
df.loc[df['point_next'].isna(), 'point_next'] = None
In [4]: df
Out[4]:
    latitude  longitude                                point                           point_next
0 -25.145439 -54.294871  25 8m 43.5804s S, 54 17m 41.5356s W                                 None
1 -24.144564 -54.240094  24 8m 40.4304s S, 54 14m 24.3384s W  25 8m 43.5804s S, 54 17m 41.5356s W
2 -24.142564 -54.198901  24 8m 33.2304s S, 54 11m 56.0436s W  24 8m 40.4304s S, 54 14m 24.3384s W
3 -24.140093  52.119021    24 8m 24.3348s S, 52 7m 8.4756s E  24 8m 33.2304s S, 54 11m 56.0436s W

Calculate the distances:

df['distance_km'] = df.apply(lambda row: distance(row['point'], row['point_next']).km if row['point_next'] is not None else float('nan'), axis=1)
df = df.drop('point_next', axis=1)
In [6]: df
Out[6]:
    latitude  longitude                                point   distance_km
0 -25.145439 -54.294871  25 8m 43.5804s S, 54 17m 41.5356s W           NaN
1 -24.144564 -54.240094  24 8m 40.4304s S, 54 14m 24.3384s W    111.003172
2 -24.142564 -54.198901  24 8m 33.2304s S, 54 11m 56.0436s W      4.192654
3 -24.140093  52.119021    24 8m 24.3348s S, 52 7m 8.4756s E  10449.661388
0
votes

Be ready that .apply(geopy.distance(), axis=1) will work really slow if you are working with big amount of data (hundreds of thousands).

One workaround there is using Haversine formula, which can be effectively vectorized within pandas/numpy frame (but maybe it is less precise). Other way is using something called geopandas, if youre Ok with external packages