Hi MarkLoggers out there,
I have again a question for you! I have a collection of documents containing postalcode information. 400.000 docs. The docs are ordered one zip code per doc, each doc contains 400 features , ordered in categories and variabeles like so:
<postcode id="9728" xmlns="http://www.nvsp.nl/p4">
<meta-data>
<!--
Generated by DIKW for NetwerkVSP ST!P
-->
<version>0.3</version>
<dateCreated>2014-06-28+02:00</dateCreated>
</meta-data>
<category name="Oplages">
<variable name="Oplage" updated="2014-08-12+02:00">
<segment name="Bruto">1234</segment>
<segment name="Stickers">234</segment>
<segment name="Netto">1000</segment>
<segment name="Aktief">J</segment>
</variable>
</category>
<category name="Automotive">
<variable name="Leaseauto">
<segment name="Leaseauto">2.68822210725987</segment>
</variable>
<variable name="Autotype">
<segment name="De Oudere Stadsrijder">4.61734781858941</segment>
<segment name="De Dure Tweedehandsrijder">6.02534919813761</segment>
<segment name="De Autoloze">41.187790998448</segment>
<segment name="De Leasende Veelrijder">0.608035868253147</segment>
<segment name="De Modale Middenklasser">13.1996896016555</segment>
<segment name="De Vermogende Autoliefhebber">4.45283669598206</segment>
<segment name="De Vermogende Kilometervreter">2.07690981203656</segment>
<segment name="De Doelmatige Budgetrijder">17.2048629073978</segment>
<segment name="De Doorsnee Nieuw Kopende Automob">10.1595102603897</segment>
</variable>
...
400 more cat/var/segment element
...
</postcode>
I need to find a subset of docs based on the id attribute in postcode element and return only specific elements.
Elements to return are in cat Oplages var Oplage and I need segments Bruto and Netto
Now we have a rest api extension that does that but not fast enough.
Example query:
xquery version "1.0-ml";
declare namespace html = "http://www.w3.org/1999/xhtml";
declare namespace p4ns = "http://www.nvsp.nl/p4";
declare namespace wijkns = "http://www.nvsp.nl/wijk";
let $segment := "Bruto"
let $zoeker0 := cts:search(fn:doc(), cts:element-attribute-range-query(xs:QName("p4ns:postcode"), xs:QName("id"), "=", ("2311","2312","2313")))
let $zoeker1 := cts:search(/p4ns:postcode, cts:element-attribute-range-query(xs:QName("p4ns:postcode"), xs:QName("id"), "=", ("2311","2312","2313")))
let $zoeker2 := cts:search(/p4ns:postcode, cts:element-attribute-value-query(xs:QName("p4ns:postcode"), xs:QName("id"), ("2311","2312","2313")))
let $inhoud1 := $zoeker0//p4ns:segment[@name=$segment]
let $inhoud2 := $zoeker1//p4ns:segment[@name=$segment]/text()
let $r1 := cts:search(/p4ns:postcode, cts:element-attribute-range-query(xs:QName("p4ns:segment"), xs:QName("name"), "=", $segment))
return $inhoud2
Now if I profile this test query the slow part is looking up the "Bruto" segment in de docs returned by the cts:search. I know I should avoid looking up elements in docs via xpath but I do not know how to combine the two bits hitting only indexes...
Profiler outcome:
.main:13:44 1446 27 7127 30 7938 @name = "Bruto"
.main:12:44 1446 27 6956 30 7793 @name = "Bruto"
.main:17:11 1 9.3 2431 9.4 2458 cts:search(fn:collection()/p4ns:postcode, cts:element-attribute-range-query(xs:QName("p4ns:segment"), fn:QName("", "name"), "=", $segment))
.main:10:16 1 7.2 1874 7.2 1885 cts:search(fn:collection()/p4ns:postcode, cts:element-attribute-value-query(xs:QName("p4ns:postcode"), fn:QName("", "id"), ("2311", "2312", "2313")))
Query result:
1234
4567
3456
NOW my question(s):
1) What does "@name = "Bruto"" mean and why is it slow?
2) Ideally I would combine the search of docs with looking up the segment element via xpath into one combination but if I put $zoeker into a cts:search it is unsearchable... What is the best approach to get my result back in one go?
thx in advance!
hugo