0
votes

I am trying to parse an XML file, and then write selected retrieved objects to a csv file.

Here is my basic XML file:

<?xml version="1.0"?>
<library owner="John Q. Reader">
    <book>
        <title>Sandman Volume 1: Preludes and Nocturnes</title>
        <author>Neil Gaiman</author>
    </book>
    <book>
        <title>Good Omens</title>
        <author>Neil Gamain</author>
        <author>Terry Pratchett</author>
    </book>
    <book>
        <title>"Repent, Harlequin!" Said the Tick-Tock Man</title>
        <author>Harlan Ellison</author>
    </book>
    </book>
</library>

I have written a basic script with Python 2.7 and minidom. Here it is:


# Test Parser

from xml.dom.minidom import parse
import xml.dom.minidom

def printLibrary(myLibrary):
    books = myLibrary.getElementsByTagName("book")
    for book in books:
        print "*****Book*****"
        print "Title: %s" % book.getElementsByTagName("title")[0].childNodes[0].data
        a = for author in book.getElementsByTagName("author"):
            print "Author: %s" % author.childNodes[0].data
            a.csv.writer()
doc = parse('library.xml')
myLibrary = doc.getElementsByTagName("library")[0]

# Get book elements in library
books = myLibrary.getElementsByTagName("book")

# Print each book's title
printLibrary(myLibrary)

So far, this script when run from the command line in Win7, displays the book title and author(s) for each book.

What I want to do it to output these result to a csv file so it looks something like this:

title, author title, author title, author title, author title, author etc

However, I can't get it to work - I'm fairly new to Python, I do work in IT and SQL and basic programming is where I'm at.

Any help would be much appreciated!!

1
<?xml version="1.0"?> <library owner="John Q. Reader"> <book> <title>Sandman Volume 1: Preludes and Nocturnes</title> <author>Neil Gaiman</author> </book> <book> <title>Good Omens</title> <author>Neil Gamain</author> <author>Terry Pratchett</author> </book> <book> <title>"Repent, Harlequin!" Said the Tick-Tock Man</title> <author>Harlan Ellison</author> </book> </book> </library> - user4691770
Sorry, the XML file I've posted here isn't formatted, I couldn't see how to make it look pretty, with indentation etc. - user4691770
Can you also show the sample output? because it's not clear from the question. - Vinod Sharma
have you tried using the csv module? - Clarus

1 Answers

0
votes

Use csv module.

# Test Parser

from xml.dom.minidom import parse
import csv 


def writeToCSV(myLibrary):
    csvfile = open('output.csv', 'w')
    fieldnames = ['title', 'author']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()

    books = myLibrary.getElementsByTagName("book")
    for book in books:
        titleValue = book.getElementsByTagName("title")[0].childNodes[0].data
        for author in book.getElementsByTagName("author"):
            authorValue = author.childNodes[0].data
            writer.writerow({'title': titleValue, 'author': authorValue})

doc = parse('library.xml')
myLibrary = doc.getElementsByTagName("library")[0]

# Get book elements in library
books = myLibrary.getElementsByTagName("book")

# Print each book's title
writeToCSV(myLibrary)

Output file:

title,author
Sandman Volume 1: Preludes and Nocturnes,Neil Gaiman
Good Omens,Neil Gamain
Good Omens,Terry Pratchett
"""Repent, Harlequin!"" Said the Tick-Tock Man",Harlan Ellison