How to get XML root element attributes using Groovy XmlParser without “double parsing” of XML

Question

This simplified code works correctly:

    static stringXML = '''<?xml version='1.0' encoding='UTF-8'?>
    <ftc:FATCA_OECD xsi:schemaLocation='urn:oecd:ties:fatca:v1 FatcaXML_v1.1.xsd' version='1.1' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns:sfa='urn:oecd:ties:stffatcatypes:v1' xmlns:ftc='urn:oecd:ties:fatca:v1'>
    <ftc:MessageSpec>Spec</ftc:MessageSpec>
</ftc:FATCA_OECD>'''
    static void main(args) {
    //NAMESPACE UNAWARE PARSING
        def rep = new XmlParser(false,false).parseText(stringXML)
        def attrMap = rep.attributes()
        attrMap.each {k,v ->
            println "$k, $v"
        }
    //NAMESPACE AWARE PARSING
        rep = new XmlParser().parseText(stringXML)
        def ftc = new groovy.xml.Namespace(attrMap['xmlns:ftc'])
        println rep[ftc.MessageSpec].text()
    }
}

And produces following correct output:

xsi:schemaLocation, urn:oecd:ties:fatca:v1 FatcaXML_v1.1.xsd

version, 1.1

xmlns:xsi, http://www.w3.org/2001/XMLSchema-instance

xmlns:sfa, urn:oecd:ties:stffatcatypes:v1

xmlns:ftc, urn:oecd:ties:fatca:v1

Spec

The problem is, that I am already using in quite extensive code Namespace aware parsing and I would like to keep it....

Therefore I would have to use both namespace unaware and namespace aware parsing as in code above

Do you know, how to produce the same result without double parsing the whole .xml (the .xml is quite large) or by extracting just root element of the .xml and than using namespace aware parsing....

daggett daggett · Accepted Answer · 2018-07-12T15:23:23

normally you have to know the namespace of your xml so the code to take required element could look like this:

def stringXML = """<?xml version='1.0' encoding='UTF-8'?>
    <ftc:FATCA_OECD xsi:schemaLocation='urn:oecd:ties:fatca:v1 FatcaXML_v1.1.xsd' version='1.1' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns:sfa='urn:oecd:ties:stffatcatypes:v1' xmlns:ftc='urn:oecd:ties:fatca:v1'>
    <ftc:MessageSpec>Spec</ftc:MessageSpec>
</ftc:FATCA_OECD>"""
def rep = new XmlParser().parseText(stringXML)

def ftc_v1 = new groovy.xml.Namespace('urn:oecd:ties:fatca:v1')
println rep[ftc_v1.MessageSpec].text()

or you can take the namespace of the root element and access the subelement MessageSpec that has the same namespace

def ftc_vx = new groovy.xml.Namespace( rep.name().getNamespaceURI() )
println rep[ftc_vx.MessageSpec].text()

finally this means that you don't care about namespace and you can use XmlSlurper that allows you to access elements ignoring namespaces:

rep = new XmlSlurper().parseText(stringXML)
println rep.MessageSpec.text()

How to get XML root element attributes using Groovy XmlParser without “double parsing” of XML

1 Answers