Python, numpy, scipy: How to exclude a location error from GPS coordinates? (averaging lon, lat)

Question

I use Python:

I have 2 arrays of GPS points - lon and lat (more than 500 000 points).

I have 1 array of date-time.

lon = numpy.array(lon)
lat = numpy.array(lat)
dt = numpy.array(dt)

I have a location error (GPS sensor error). For example 15 meters.

GPS_sensor_error = 0.015

I need to exclude a GPS_sensor_error from coordinates that there were no asterisks on a track.

enter image description here

(I don't draw a point with identical coordinates)

enter image description here

How I am able to do it?

Now:

I calculate distance between points.
I find the minimum distance, if it less GPS_sensor_error, then I average lon, lat.
repeat 1.
repeat 2.
repeat until all distances won't be more GPS_sensor_error

Update:

     lon = numpy.array()
     lat = numpy.array()

     flag = True
     while flag:
        lon1 = lon[:-1]
        lon2 = lon[1:]
        lat1 = lat[:-1]
        lat2 = lat[1:]

        '''distance'''
        x = (lon2 - lon1)
        y = (lat2 - lat1)
        d = numpy.sqrt(x * x + y * y)

        min = numpy.min(d)
        if min < GPS_sensor_error:
            j = numpy.where(d == min)[0][0]

            lon[j] = (lon[j] + lon[j + 1]) / 2
            lat[j] = (lat[j] + lat[j + 1]) / 2

            lon = numpy.delete(lon, j + 1)
            lat = numpy.delete(lat, j + 1)

        else:
            flag = False

Bypass on all points works at a pure python very long... Prompt please, how to implement it using scipy, numpy?

Thanks

P.s. probably already there is a GPS filter in scipy, numpy?

I think I understand in principle what you want to achieve, however, details are unclear from your question. Could you give a simple example (3 or 4 points), that shows what you mean by distance, what means average lon,lat exactly in this context? — Theodros Zelleke
Why don't you include your slow Python code? The way in which you want to conduct the averaging is not very clear from your deescription. — Jaime
@Olga, in your slow algorithm lon and lat are shrinking with each iteration: lon = numpy.delete(lon, j + 1); lat = numpy.delete(lat, j + 1). Now in a comment on @spam_eggs answer you explicitely state, that the sizes of lon and lat should be preserved???? — Theodros Zelleke
As I said in my answer: Dont use pythagoras for the distance. It gets terribly inaccurate at higher and lower latitudes. — RickyA

RickyA RickyA · Accepted Answer · 2013-01-10T15:08:22

From a data science perspective what you are doing is not correct. You cant just use the average error distance as a cutoff and think your data will be more correct. The two points you are comparing can have an error more or less than 15 m they can shift toward each other or move away of each other. And if you don't have another exact dataset there is no way of telling what would be the correct point. You can't make this dataset more precise.

However I think you goal is to simplify your dataset, not to make it more accurate. For that you can use the Douglas–Peucker algorithm. I would suggest you load your data in an Postgis enabled database (Postgresql + postgis) and then use the simplify function. This will require some db setup time, but then it will speed you up greatly. However if you want it in pure python this SO question has a very nice snippet.

BTW. If your are doing distance calculations with lat,lon do not use Pythagoras. It is not valid since lat,lon is not Euclidean. Use the haversine algorithm.

Python, numpy, scipy: How to exclude a location error from GPS coordinates? (averaging lon, lat)

2 Answers