How to get XML comments using Java's DocumentBuilder

Question

I have an application that uses SAML authentication, acts as an SP, and therefore parses SAMLResponses. I received notification that an IdP that communicates with my application will now start signing their SAMLResponses with http://www.w3.org/2001/10/xml-exc-c14n#WithComments, which means comments matter when calculating the validity of the SAML signature.

Here lies the problem - the library I use for XML parsing strips these comment nodes by default. See this example program:

import org.apache.commons.io.IOUtils;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

public class Main {

    public static void main(String[] args) {
        try {
            String xml = "<NameID>test@email<!---->.com</NameID>";
            DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
            documentBuilderFactory.setNamespaceAware(true);
            DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
            Document doc = documentBuilder.parse(IOUtils.toInputStream(xml));
            NodeList nodes = doc.getElementsByTagName("NameID");

            if (nodes == null || nodes.getLength() == 0)
            {
                throw new RuntimeException("No NameID in document");
            }

            System.out.println(nodes.item(0).getTextContent());

        } catch(Exception e) {
            System.err.println(e.getMessage());
        }
    }
}

So, this program will print [email protected] (which means that's what my SAML code will get, too). This is a problem, as I'm pretty sure it will cause signature validation to fail without the comment included, since the XML document was signed with the #WithComments canonicalization method.

Is there any way to get DocumentBuilder/getTextContent() to leave in comment nodes so my signature is not invalidated by the missing comment?

Documentation for getTextContent() is here: https://docs.oracle.com/javase/7/docs/api/org/w3c/dom/Node.html#getTextContent()

lexicore lexicore · Accepted Answer · 2018-03-01T20:49:48

Your code actually retains the comment.

Here, slightly modified:

public static void main(String[] args) throws Exception {
    String xml = "<NameID>test@email<!--foobar-->.com</NameID>";
    DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
    documentBuilderFactory.setNamespaceAware(true);
    DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
    Document doc = documentBuilder.parse(new ByteArrayInputStream(xml.getBytes(StandardCharsets.UTF_8)));
    NodeList childNodes = doc.getDocumentElement().getChildNodes();
    Node[] nodes = new Node[childNodes.getLength()];
    for (int index = 0; index < childNodes.getLength(); index++) {
        nodes[index] = childNodes.item(index);
    }
    System.out.println(nodes[1].getTextContent());
}

Prints foobar. (Run it on Ideone.)

There are 3 child nodes of the root element, one of the is the comment node. So it is actually retained.

How to get XML comments using Java's DocumentBuilder

1 Answers