I am trying to parse a .kml file into Python using the xml module (after failing to make this work in BeautifulSoup, which I use for HTML).
As this is my first time doing this, I followed the official tutorial and all goes well until I try to construct an iterator to extract my data by root iteration:
from lxml import etree
tree=etree.parse('kmlfile')
Here is the example from the tutorial I am trying to emulate:
If you know you are only interested in a single tag, you can pass its name to getiterator() to have it filter for you:
for element in root.getiterator("child"): print element.tag, '-', element.text
I would like to get all data under 'Placemark', so I tried
for i in tree.getiterterator("Placemark"):
print i, type(i)
which doesn't give me anything. What does work is:
for i in tree.getiterterator("{http://www.opengis.net/kml/2.2}Placemark"):
print i, type(i)
I don't understand how this comes about. The www.opengis.net is listed in the tag at the beginning of the document (kml xmlns="http://www.opengis.net/kml/2.2"...) , but I don't understand
how the part in {} relates to my specific example at all
why it is different from the tutorial
- and what I am doing wrong
Any help is much appreciated!
{http://www.opengis.net/kml/2.2}Placemarknotation, or implicitly, by assigning a handle ("prefix") to a URI and then using that handle, like XPath does it. You are free to choose whatever handle you like, you don't have to use the same handle that was in the XML. Go ahead and registerkmlashttp://www.opengis.net/kml/2.2and usekmlin your XPath queries. - Tomalakredto the color code#FF0000. XML even has a facility to define a default color for all elements that don't have their own color defined. When querying the XML through XPath you must specify the color. XPath knows nothing about the "default color" mechanism that XML provides, you must either query explicitlycar[namespace-uri() = '#FF0000']or tell XPath up-front that#FF0000shall be known asredso that you can queryred:car. - Tomalak