12
votes

I am a new linux/python user and have .gpx files (output files that are made from GPS tracking software) and need to extract values into csv/txt for use in a GIS program. I have looked up strings and slicing etc. in my beginning python book, this website, and online. I have used a .gpx to .txt converter and can pull out the longitude and latitude into a text file. I need to extract the elevation data. The file has six lines of text at the top and I only know how to open this file in emacs (aside from uploading on a website) Here is the file starting at line 7.

Optimally, I would like to know how to extract all values through python (or Perl) into a csv or txt file. If anyone knows a website tutorial or a sample script it would be appreciated.

<metadata>
<time>2012-06-13T01:51:08Z</time>
</metadata>
<trk>
<name>Track 2012-06-12 19:51</name>
<trkseg>
<trkpt lat="43.49670697" lon="-112.03380961">
<ele>1403.0</ele>
<time>2012-06-13T01:53:44Z</time>
<extensions>
<ogt10:accuracy>34.0</ogt10:accuracy></extensions>
</trkpt>
<trkpt lat="43.49796612" lon="-112.03970968">
<ele>1410.9000244140625</ele>
<time>2012-06-13T01:57:10Z</time>
<extensions>
<gpx10:speed>3.75</gpx10:speed>
<ogt10:accuracy>13.0</ogt10:accuracy>
<gpx10:course>293.20001220703125</gpx10:course></extensions>
</trkpt>
<trkpt lat="43.49450857" lon="-112.04477274">
<ele>1406.5</ele>
<time>2012-06-13T02:02:24Z</time>
<extensions>
<ogt10:accuracy>12.0</ogt10:accuracy></extensions>
</trkpt>
</trkseg>
<trkseg>
<trkpt lat="43.49451057" lon="-112.04480354">
<ele>1398.9000244140625</ele>
<time>2012-06-13T02:54:55Z</time>
<extensions>
<ogt10:accuracy>10.0</ogt10:accuracy></extensions>
</trkpt>
<trkpt lat="43.49464813" lon="-112.04472215">
<ele>1414.9000244140625</ele>
<time>2012-06-13T02:56:06Z</time>
<extensions>
<ogt10:accuracy>7.0</ogt10:accuracy></extensions>
</trkpt>
<trkpt lat="43.49432573" lon="-112.04489684">
<ele>1410.9000244140625</ele>
<time>2012-06-13T02:57:27Z</time>
<extensions>
<gpx10:speed>3.288236618041992</gpx10:speed>
<ogt10:accuracy>21.0</ogt10:accuracy>
<gpx10:course>196.1999969482422</gpx10:course></extensions>
</trkpt>
<trkpt lat="43.49397445" lon="-112.04505216">
<ele>1421.699951171875</ele>
<time>2012-06-13T02:57:30Z</time>
<extensions>
<gpx10:speed>3.0</gpx10:speed>
<ogt10:accuracy>17.0</ogt10:accuracy>
<gpx10:course>192.89999389648438</gpx10:course></extensions>
</trkpt>
<trkpt lat="43.49428702" lon="-112.04265923">
<ele>1433.0</ele>
<time>2012-06-13T02:58:46Z</time>
<extensions>
<gpx10:speed>4.5</gpx10:speed>
<ogt10:accuracy>18.0</ogt10:accuracy>
<gpx10:course>32.400001525878906</gpx10:course></extensions>
</trkpt>
<trkpt lat="43.49444603" lon="-112.04263691">
<ele>1430.199951171875</ele>
<time>2012-06-13T02:58:50Z</time>
<extensions>
<gpx10:speed>4.5</gpx10:speed>
<ogt10:accuracy>11.0</ogt10:accuracy>
<gpx10:course>29.299999237060547</gpx10:course></extensions>
</trkpt>
<trkpt lat="43.49456961" lon="-112.04260058">
<ele>1430.4000244140625</ele>
<time>2012-06-13T02:58:52Z</time>
<extensions>
<gpx10:speed>4.5</gpx10:speed>
<ogt10:accuracy>8.0</ogt10:accuracy>
<gpx10:course>28.600000381469727</gpx10:course></extensions>
</trkpt>
<trkpt lat="43.49570131" lon="-112.04001132">
<ele>1418.199951171875</ele>
<time>2012-06-13T03:00:08Z</time>
<extensions>
5
Out of curiosity: Have you ever figured this out? - simbabque

5 Answers

13
votes

You can install GPXpy

sudo pip install gpxpy

Then just use the library:

import gpxpy 
import gpxpy.gpx 

 gpx_file = open('input_file.gpx', 'r') 

    gpx = gpxpy.parse(gpx_file) \
    for track in gpx.tracks: 
      for segment in track.segments: 
    for point in segment.points: 
      print 'Point at ({0},{1}) -> {2}'.format(point.latitude, point.longitude, point.elevation) 

    for waypoint in gpx.waypoints: 
      print 'waypoint {0} -> ({1},{2})'.format(waypoint.name, waypoint.latitude, waypoint.longitude) 

    for route in gpx.routes: 
      print 'Route:' 

For more info: https://pypi.python.org/pypi/gpxpy

Regards

9
votes

GPX is an XML format, so use a fitting module like lxml or the included ElementTree XML API to parse the data, then output to CSV using the python csv module.

Tutorials covering these concepts:

I also found a python GPX parsing library called gpxpy that perhaps gives a higher-level interface to the data contained in GPX files.

8
votes

Since Martijn posted a Python answer and said Perl would turn to line noise I felt there is the need for a Perl answer, too.

On CPAN, the Perl module directory, there is a module called Geo::Gpx. As Martijn already said, GPX is an XML format. But fortunately, someone has already made it into a module that handles the parsing for us. All we have to do is load that module.

There are several modules available for CSV handling, but the data in this XML file is rather simple, so we don't really need one. We can do it on our own with the built-in functionality.

Please consider the following script. I'll give an explanation in a minute.

use strict;
use warnings;
use Geo::Gpx;
use DateTime;
# Open the GPX file
open my $fh_in, '<', 'fells_loop.gpx';
# Parse GPX
my $gpx = Geo::Gpx->new( input => $fh_in );
# Close the GPX file
close $fh_in;

# Open an output file
open my $fh_out, '>', 'fells_loop.csv';
# Print the header line to the file
print $fh_out "time,lat,lon,ele,name,sym,type,desc\n";

# The waypoints-method of the GEO::GPX-Object returns an array-ref
# which we can iterate in a foreach loop
foreach my $wp ( @{ $gpx->waypoints() } ) {
  # Some fields seem to be optional so they are missing in the hash.
  # We have to add an empty string by iterating over all the possible
  # hash keys to put '' in them.
  $wp->{$_} ||= '' for qw( time lat lon ele name sym type desc );

  # The time is a unix timestamp, which is hard to read.
  # We can make it an ISO8601 date with the DateTime module.
  # We only do it if there already is a time, though.
  if ($wp->{'time'}) {
    $wp->{'time'} = DateTime->from_epoch( epoch => $wp->{'time'} )
                             ->iso8601();
  }
  # Join the fields with a comma and print them to the output file
  print $fh_out join(',', (
    $wp->{'time'},
    $wp->{'lat'},
    $wp->{'lon'},
    $wp->{'ele'},
    $wp->{'name'},
    $wp->{'sym'},
    $wp->{'type'},
    $wp->{'desc'},
  )), "\n"; # Add a newline at the end
}
# Close the output file
close $fh_out;

Let's take this in steps:

  • use strict and use warnings enforce rules like declaring variables and tell you about common mistakes that are the hardest to find.
  • use Geo::Gpx and use DateTime are the modules we use. Geo::Gpx is going to handle the parsing for us. We need DateTime to make unix timestamps into readable dates and times.
  • The open function opens a file. $fh_in is the variable that holds the filehandle. The GPX file we want to read is fells_loop.gpx which I took the liberty of borrowing from topografix.com. You can find more info on open in perlopentut.
  • We create a new Geo::Gpx object called $gpx and use our filehandle $fh_in to tell it where to read the XML data from. The new-method is provided by all Perl modules that have an object oriented interface.
  • close closes the filehandle.
  • The next open has a > to tell Perl that we want to write to this filehandle.
  • We print to a filehandle by putting it as the first argument to print. Note that there is no comma after the filehandle. The \n is a newline character.
  • The foreach loop takes the return value of the waypoints-method of the Geo::Gpx object. This value is an array reference. Think of this as an array that holds arrays (see perlref if you want to know more about references). In each iteration of the loop, the next element of that array ref (which represents a waypoint in the GPX data) will be put into $wp. If printed with Data::Dumper it looks like this:

    $VAR1 = {
          'ele' => '64.008000',
          'lat' => '42.455956',
          'time' => 991452424,
          'name' => 'SOAPBOX',
          'sym' => 'Cemetery',
          'desc' => 'Soap Box Derby Track',
          'lon' => '-71.107483',
          'type' => 'Intersection'
        };
    
  • Now the postfix for is a bit tricky. As we just saw, there are 8 keys in the hashref. Unfortunately, some of them are sometimes missing. Because we have use warnings, we will get a warning if we try to access one of these missing values. We have to create these keys and put an empty string '' in there.

    foreach and for are completely interchangeable in Perl, and both can also be used in postfix syntax behind a single expression. We use the qw-operator to create the list that for will iterate. qw is short for quoted words and it does just that: it returns a list of the strings in it, but quoted. We could also have said ('time', 'lat', 'long'... ).

    In the expression, we access each key of $wp. $_ is the loop variable. In the first iteration it will hold 'time', then 'lat' and so on. Since $wp is a hashref, we need the -> to access it's keys. The curly braces tell that it's a hashref. The ||= operator assigns a value to our hash ref element only if it is not a true value.

  • Now, if there is a time value (the empty string we just assigned if the date was not set is regarded as 'there is none'), we replace the unix timestamp with a proper date. DateTime helps us to do that. The from_epoch method gets the unix timestamp as an argument. It returns a DateTime object which we can directly use to call the iso8601 function on it.

    This is called chaining. Some modules can do it. It is similar to what jQuery's JavaScript objects do. The unix timestamp in our hashref is replaced with the result of the DateTime operation.

  • Now we print to our filehandle again. join is used to put commas between the values. We also put a newline at the end again.
  • Once we're done with the loop, we close the filehandle.
  • Now we're done! :)

All in all, I'd say this is pretty simple and also quite readable, isn't it? I tried to make it a healthy mix of overly verbose syntax with a _Perl_ish flavor.

1
votes

Every time I try to do this, I scour the internet for solutions and end up writing my own regex parser.

import re
import numpy as np

GPXfile='Lunch_Walk.gpx'
data = open(GPXfile).read()

lat = np.array(re.findall(r'lat="([^"]+)',data),dtype=float)
lon = np.array(re.findall(r'lon="([^"]+)',data),dtype=float)
time = re.findall(r'<time>([^\<]+)',data)


combined = np.array(list(zip(lat,lon,time)))

This gives an array of the format:

array([['51.504613', '-0.141894', '2020-12-26T12:43:14Z'],
       ['51.504624', '-0.141901', '2020-12-26T13:10:26Z'],
       ['51.504633', '-0.141906', '2020-12-26T13:10:28Z'],
       ...)

You can then do with this whatever you desire.

0
votes

While gpxpy is the popular python answer, and I found this answer myself and tried it, I found it frustrating it was difficult if not impossible to get out extension type data like heartrate, and one still has to loop through the various nested xml ancestors/children so I wrote gpxcsv.

As easy as:

from gpxcsv import gpxtolist
import pandas as pd

df = pd.DataFrame(
    pxtolist('myfile.gpx'))

for a dataframe, or a command line tool exists to just create a csv or json file, preserving as many columns in the trackpoint as it finds using the tags as the column names.

Source code of the project on github.