2
votes

I have a fairly large set of large XML files, from each of which I want to extract some data. I'm using an evaluation version of Altova XMLSpy, in which I managed to get an XPATH to work. However, I need the data in CSV or text format, so I can use it in R or Excel for further evaluation and I cannot copy the results of the XPATH to a file. I found that using XQUERY I would be able to, but I fail to get the XQUERY to work for at least one file.

The XML is structured like:

<d2LogicalModel xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datex2.eu/schema/2_0/2_0" modelBaseVersion="2.0" xsi:schemaLocation="http://datex2.eu/schema/2_0/2_0 D:\NDW\CSS\DataGenerator\DATEXIISchema_2_0_2_0.xsd">
<payloadPublication xmlns="http://datex2.eu/schema/2_0/2_0" xsi:type="MeasuredDataPublication" lang="nl">
    <publicationTime>2011-04-21T05:58:34Z</publicationTime>
    <publicationCreator>
        <country>nl</country>
        <nationalIdentifier>NDW-CNS</nationalIdentifier>
    </publicationCreator>
    <measurementSiteTableReference>NDW01_MT_321</measurementSiteTableReference>
    <headerInformation>
        <areaOfInterest>national</areaOfInterest>
        <confidentiality>restrictedToAuthorities</confidentiality>
        <informationStatus>real</informationStatus>
    </headerInformation>
    <siteMeasurements>
        <measurementSiteReference>GRT01_MORO_1002_2</measurementSiteReference>
        <measurementTimeDefault>2011-04-21T05:57:00Z</measurementTimeDefault>
        <measuredValue index="1">
            <basicDataValue xsi:type="TrafficSpeed"/>
        </measuredValue>
        <measuredValue index="2">
            <basicDataValue xsi:type="TrafficSpeed"/>
        </measuredValue>
        <measuredValue index="3">
            <basicDataValue xsi:type="TrafficSpeed"/>
        </measuredValue>
        <measuredValue index="4">
            <basicDataValue xsi:type="TrafficSpeed"/>
        </measuredValue>
        <measuredValue index="5">
            <basicDataValue xsi:type="TrafficSpeed"/>
        </measuredValue>
        <measuredValue index="6">
            <basicDataValue xsi:type="TrafficSpeed"/>
        </measuredValue>
    </siteMeasurements>
    <siteMeasurements>
        <measurementSiteReference>RWS01_MONIBAS_0021hrr2131ra</measurementSiteReference>
        <measurementTimeDefault>2011-04-21T05:57:00Z</measurementTimeDefault>
        <measuredValue index="1">
            <basicDataValue xsi:type="TrafficFlow">
                <time>2011-04-21T05:56:00Z</time>
                <vehicleFlow>900</vehicleFlow>
            </basicDataValue>
        </measuredValue>
        <measuredValue index="2">
            <basicDataValue xsi:type="TrafficSpeed">
                <numberOfInputValuesUsed>60</numberOfInputValuesUsed>
                <standardDeviation>0</standardDeviation>
                <time>2011-04-21T05:56:00Z</time>
                <averageVehicleSpeed>115</averageVehicleSpeed>
            </basicDataValue>
        </measuredValue>
        <measuredValue index="3">
            <basicDataValue xsi:type="TrafficFlow">
                <time>2011-04-21T05:56:00Z</time>
                <vehicleFlow>1020</vehicleFlow>
            </basicDataValue>
        </measuredValue>
        <measuredValue index="4">
            <basicDataValue xsi:type="TrafficSpeed">
                <numberOfInputValuesUsed>60</numberOfInputValuesUsed>
                <standardDeviation>0</standardDeviation>
                <time>2011-04-21T05:56:00Z</time>
                <averageVehicleSpeed>104</averageVehicleSpeed>
            </basicDataValue>
        </measuredValue>
    </siteMeasurements>

I want to filter on a specific value for measurementSiteReference and get the results of all measuredValue with a basicDataValue for TrafficFlow, preferably in the format of:

index, value, timestamp
1, 900, 05:56:00
3, 1020, 05:56:00

I have the following XPATH:

//text()[contains(.,"GEO01_Z_RWSTI1011")]/parent::*/parent::*/descendant::measuredValue[(@index)]/basicDataValue/vehicleFlow

This gives me the results for one file, but I cannot find a way to convert that XPATH to an XQUERY. The current XQUERY returns no results:

let $nl := "&#10;"
for $x in doc("TrafficSpeed 20110421 0800-1559\0800_trafficspeed")/d2LogicalModel/payloadPublication/siteMeasurements
where $x/measurementSiteReference/text()[contains(.,"GEO01_Z_RWSTI1011")]
return concat($x/measurementSiteReference/measuredValue,$nl)

How could I get the return I want using XQUERY?

3

3 Answers

0
votes

Try - http://www.stylusstudio.com/xquery_primer.html or http://xmlbeans.apache.org/docs/2.0.0/guide/conSelectingXMLwithXQueryPathXPath.html

In addition there are excellent examples from Oracle , IBM and Microsoft - if you really need advanced help.

0
votes

Your elements are bound to a namespace xmlns="http://datex2.eu/schema/2_0/2_0", but you are not namespace qualifying your elements in your XPATH statements. So, your XPATH statements don't select the elements that you want.

You would want to do something like this to declare a namespace and use it in your XPath statements:

declare namespace datex = "http://datex2.eu/schema/2_0/2_0";

let $nl := "&#10;"

for $x in doc("TrafficSpeed 20110421 0800-1559\0800_trafficspeed")/datex:d2LogicalModel/datex:payloadPublication/datex:siteMeasurements
where $x/datex:measurementSiteReference/text()[contains(.,"GEO01_Z_RWSTI1011")]
return concat($x/datex:measurementSiteReference/datex:measuredValue,$nl)

However, you will then likely run into a problem using concat() on a sequence, and that the current code doesn't produce your desired output.

0
votes

I managed to get the answer, though not complete as I'd like:

declare namespace datex = "http://datex2.eu/schema/2_0/2_0";
declare variable  $sep := ',';
declare variable  $eol := '&#10;';

for $x in collection("0900_trafficspeed")/datex:d2LogicalModel/datex:payloadPublication/datex:siteMeasurements
let $site := $x/datex:measurementSiteReference/text()
let $time := $x/datex:measurementTimeDefault/text()
let $index := $x/datex:measurementSiteReference/parent::*/descendant::datex:measuredValue/@index
let $flow := $x/datex:measurementSiteReference/parent::*/descendant::datex:measuredValue/datex:basicDataValue/datex:vehicleFlow/text()
where $x/datex:measurementSiteReference/text()[contains(.,"GEO01_Z_RWSTI1011")]
return string(concat(string-join(($site,$time,$flow),$sep),$eol))