2
votes

I am trying to escape html characters of a string and use this string to build a DOM XML using parseXml method shown below. Next, I am trying to insert this DOM document into database. But, when I do that I am getting the following error:

org.xml.sax.SAXParseException: Reference is not allowed in prolog.

I have three questions: 1) I am not sure how to escape double quotes. I tried replaceAll("\"", """) and am not sure if this is right.

2) Suppose I want a string starting and ending with double quotes (eg: "sony"), how do I code it? I tried something like:

String sony = "\"sony\""

Is this right? Will the above string contain "sony" along with double quotes or is there another way of doing it?

3)I am not sure what the "org.xml.sax.SAXParseException: Reference is not allowed in prolog." error means. Can someone help me fix this?

Thanks, Sony

Steps in my code:

  1. Utils. java

    public static String escapeHtmlEntities(String s) { return s.replaceAll("&", "&").replaceAll("<", "<").replaceAll(">", ">").replaceAll("\"", """). replaceAll(":", ":").replaceAll("/", "/"); }

        public static Document parseXml (String xml) throws Exception { 
    
       DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setNamespaceAware(true);
        DocumentBuilder builder = factory.newDocumentBuilder();
        Document doc = builder.parse(new InputSource(new StringReader(xml)));
        doc.setXmlStandalone(false);
        return doc;
    }
    
  2. TreeController.java

    protected void notifyNewEntryCreated(String entryType) throws Exception { for (Listener l : treeControlListeners) l.newEntryCreated();

    final DomNodeTreeModel domModel = (DomNodeTreeModel) getModel();
    Element parent_item = getSelectedEntry();
    String xml = Utils.escapeHtmlEntities("<entry xmlns=" + "\"http://www.w3.org/2005/atom\"" + "xmlns:libx=" + 
            "\"http://libx.org/xml/libx2\">" + "<title>" + "New" + entryType + "</title>" +
            "<updated>2010-71-22T11:08:43z</updated>" + "<author> <name>LibX Team</name>" +
                "<uri>http://libx.org</uri>" + "<email>[email protected]</email></author>" + 
                "<libx:" + entryType + "></libx:" + entryType + ">" + "</entry>");
    xmlModel.insertNewEntry(xml, getSelectedId());
    

    }

  3. XMLDataModel.java

public void insertNewEntry (String xml, String parent_id) throws Exception { insertNewEntry(Utils.parseXml(xml).getDocumentElement(), parent_id); }

public void insertNewEntry (Element elem, String parent_id) throws Exception {

    // inserting an entry with no libx: tag will create a storage leak
    if (elem.getElementsByTagName("libx:package").getLength() +
        elem.getElementsByTagName("libx:libapp").getLength() +
        elem.getElementsByTagName("libx:module").getLength() < 1) {
        // TODO: throw exception here instead of return
        return;
    }

    XQPreparedExpression xqp = Q.get("insert_new_entry.xq");
    xqp.bindNode(new QName("entry"), elem.getOwnerDocument(), null);
    xqp.bindString(new QName("parent_id"), parent_id, null);
    xqp.executeQuery();
    xqp.close();

    updateRoots();
}
  1. insert_new_entry.xq

declare namespace libx='http://libx.org/xml/libx2'; declare namespace atom='http://www.w3.org/2005/atom'; declare variable $entry as xs:anyAtomicType external; declare variable $parent_id as xs:string external; declare variable $feed as xs:anyAtomicType := doc('libx2_feed')/atom:feed; declare variable $metadata as xs:anyAtomicType := doc('libx2_meta')/metadata; let $curid := $metadata/curid return replace value of node $curid with data($curid) + 1, let $newid := data($metadata/curid) + 1 return insert node {$newid}{ $entry// } into $feed, let $newid := data($metadata/curid) + 1 return if ($parent_id = 'root') then () else insert node http://libx.org/xml/libx2' /> into $feed/atom:entry[atom:id=$parent_id]//(libx:module|libx:libapp|libx:package)

2
Hi, I learnt that the error: "org.xml.sax.SAXParseException: Reference is not allowed in prolog." is thrown when XML being loaded doesn't have valid XML. So, the whole issue boils down to converting the string into a valid xml. So I suppose the whole issue boils down to converting the string "xml" used in above example into a valid xml. I am guessing something is wrong with the way am escaping and using html double quote characters. I am still confused with questions 1 and 2 in the above post. Thanks for the help. -Sonysony

2 Answers

1
votes

To escape a double quote, use the &quot; entity, which is predefined in XML.

So, your example string, say an attribute value, will look like

   <person name="&quot;sony&quot;"/>

There is also &apos; for apostrophe/single quote.

I see you have lots of replaceAll calls, but the replacements seem to be the same? There are some other characters that cannot be used literally, but should be escaped:

  &  --> &amp;
  >  --> &gt;
  <  --> &lt;
  "  --> &quot;
  '  --> &apos;

(EDIT: ok, I see this is just formatting - the entities are being turned into they're actual values when being presented by SO.)

The SAX exception is the parser grumbling because of the invalid XML.

As well as escaping the text, you will need to ensure it adheres to the well-formedness rules of XML. There's quite a bit to get right, so it's often simpler to use a 3rd party library to write out the XML. For example, the XMLWriter in dom4j.

0
votes

You can check out Tidy specification. its a spec released by w3c. Almost all recent languages have their own implementation.

rather than just replace or care only to < ,>, & just configure JTidy ( for java ) options and parse. this abstracts all the complication of Xml escape thing.

i have used both python , java and marklogic based tidy implementations. all solved my purposes