Java - Android SDK 8 - XML Parsing with DocumentBuilderFactory terminating strings after entities

Question

I am writing an application in Java for Android (SDK v8) that parses XML and puts the entries into a ListView. This part works fine. I am parsing the XML with a DocumentBuilder, which is terminating the strings it's outputting after an entity - excluding the entity itself. The entities I am using are standard entites &(quot, amp, apos, lt, gt); I have also tried using numeric entities in my source XML (e.g. &# 38; without the space, just so you can see what I'm outputting) and this leads to a crash of my app, with logcat reporting "unterminated entity ref".

To test that I am not using invalid XML, I have tried viewing the XML with Google Chrome - which displays it perfectly. The entry blah & blah.txt is truncated to blah. The XML I am parsing is below:

EDIT: Much shorter XML sample

<?xml version="1.1"?>  
<root>  
<object>  
<id>ROOT</id>  
<type>directory</type>  
<name>../</name>  
</object>  
<object>  
<id>09F010C143B84573A36C50F3EF7E0708</id>  
<type>file</type>  
<name>blah &amp; blah.txt</name>  
</object>   
<object>  
<id>85CF028B838D4E0096C081B987C97045</id>  
<type>file</type>  
<name>Epilist.m3u</name>  
</object>  
</root>

EDIT: XML parsing class EDIT2: Below is a complete class that (with the help of others) should now be bug free. Anyone is welcome to use this class - I am providing it as Public Domain code. You do not need to reference that I originally produced this code to use it. It is designed for Android, but by replacing references to 'Log.e' it can easily be used on any Java platform as far as I know.

package tk.dtechsoftware.mpclient;

import java.io.IOException;
import java.io.StringReader;
import java.io.UnsupportedEncodingException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.util.EntityUtils;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

import android.util.Log;

public class XMLParser {
    public String getXmlFromUrl(String url) {
        String xml = null;

        try {
            // defaultHttpClient
            DefaultHttpClient httpClient = new DefaultHttpClient();
            HttpGet httpGet = new HttpGet(url);

            // HttpResponse httpResponse = httpClient.execute(httpPost);
            HttpResponse httpResponse = httpClient.execute(httpGet);
            HttpEntity httpEntity = httpResponse.getEntity();
            xml = EntityUtils.toString(httpEntity);

        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        } catch (ClientProtocolException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
        // return XML
        return xml;
    }

    public Document getDomElement(String xml) {
        Document doc = null;
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        dbf.setExpandEntityReferences(false);
        try {

            DocumentBuilder db = dbf.newDocumentBuilder();

            InputSource is = new InputSource();
            is.setCharacterStream(new StringReader(xml));
            doc = db.parse(is);

        } catch (ParserConfigurationException e) {
            Log.e("Error: ", e.getMessage());
            return null;
        } catch (SAXException e) {
            Log.e("Error: ", e.getMessage());
            return null;
        } catch (IOException e) {
            Log.e("Error: ", e.getMessage());
            return null;
        }
        // return DOM
        return doc;
    }

    public String getValue(Element item, String str) {
        NodeList n = item.getElementsByTagName(str);
        return n.item(0).getTextContent(); 
    }

}

We really don't need 100 lines of the same type of node. Please consider shortening your code down to a small sample that conveys what you are trying to do. msmvps.com/blogs/jon_skeet/archive/2010/08/29/… — A--C
Sorry guys, just realized I've provided far too much XML data there. The important element is near the bottom, and has the <name> of blah & blah.txt — StereoRocker
I have edited my post so that it contains much less XML data, and now also contains my XMLParser class — StereoRocker

Jörn Horstmann Jörn Horstmann · Accepted Answer · 2012-12-09T23:39:15

I don't think its guaranteed that an element node has only one child node containing its text content. The contents can as well be split across multiple child nodes.

Your getElementValue method can probably be replace by a simple call to elem.getTextContent().

Java - Android SDK 8 - XML Parsing with DocumentBuilderFactory terminating strings after entities

1 Answers