2
votes

I am trying to pull multiple elements from an XML file using string-join function which works fine for a single element. However, when I try to add another one to my code I am seeing incorrect data. I suspect I'm missing a simple thing somewhere but can't seem to locate it..

sample XML data:-

<books>
  <book id="6636551">
    <master_information>
      <book_xref>
        <xref type="Fiction" type_id="1">72771KAM3</xref>
        <xref type="Non_Fiction" type_id="2">US72771KAM36</xref>
      </book_xref>
    </master_information>
    <book_details>
      <price>24.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications with XML.</description>
    </book_details>
    <global_information>
      <ratings>
        <rating agency="ABC Agency" type="Author Rating">A++</rating>
        <rating agency="DEF Agency" type="Author Rating">A+</rating>
        <rating agency="DEF Agency" type="Book Rating">A</rating>
      </ratings>
    </global_information>
    <country_info>
      <country_code>US</country_code>
    </country_info>
  </book>
  <book id="119818569">
    <master_information>
      <book_xref>
        <xref type="Fiction" type_id="1">070185UL5</xref>
        <xref type="Non_Fiction" type_id="2">US070185UL50</xref>
      </book_xref>
    </master_information>
    <book_details>
      <price>19.25</price>
      <publish_date>2002-11-01</publish_date>
      <description>A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.</description>
    </book_details>
    <global_information>
      <ratings>
        <rating agency="ABC Agency" type="Author Rating">A+</rating>
        <rating agency="ABC Agency" type="Book Rating">A</rating>
        <rating agency="DEF Agency" type="Author Rating">A</rating>
        <rating agency="DEF Agency" type="Book Rating">B+</rating>
      </ratings>
    </global_information>
    <country_info>
      <country_code>CA</country_code>
    </country_info>
  </book>
  </book>
</books>

XQuery for pulling a single element:-

for $x in string-join(('book_id,book_price', //book/book_details/price/string-join((ancestor::book/@id, .), ',')), '&#10;')
return $x

Which works fine, and spits sample output as follows:

book_id,book_price
6636551,24.95
119818569,19.25

Question is how do I pull multiple elements or a combination of elements and attributes, still using string-join probably, from a single XML file?

I tried using the following (which works OK for the most part) but I noticed for a larger data set the values seem to be populating in the wrong column randomly. E.g. in the code below if ./publish_date is blank in the data I noticed ./description data would get populated in the ./publish_date column.

for $x in string-join(('book_id,book_price,book_pub_date,book_desc', //book/book_details/string-join((ancestor::book/@id, ./price, ./publish_date, ./description), ',')), '&#10;')
return $x

FYI, I'm still learning XQuery as you can tell. I appreciate your insights/opinions/help!

1

1 Answers

4
votes

Sequences in XQuery are flattened: The expressions (1, (2, 3), ((4)), (), 5) and (1, 2, 3, 4, 5) are equivalent. This means that the length of the sequence (ancestor::book/@id, ./price, ./publish_date, ./description) varies if some of the XPath subqueries return no results. Since the function fn:string-join($strings, $sep) just puts the separatur between every pair of adjacent items in $strings (flattened), the resulting string can have a varying number of commas in it.

In order to preserve the alignment of your CSV table you can insert empty strings whenever a value is missing. An easy way to do that is using flattening to its advantage: ($possibly-empty, '')[1]

  • If $possibly-empty contains an item (e.g. 'foo') then this evaluates to ('foo', '')[1] -> 'foo'.
  • If it is the empty sequence () instead, the expression evaluates to ((), '')[1] -> ('')[1] (flattening) -> ''.

Working example (your enclosing FLWOR expression (for/return) is completely redundant because you only iterate over a single string element, so I omit it):

string-join(
  (
    'book_id,book_price,book_pub_date,book_desc',
    //book/book_details/string-join(
      (
        (ancestor::book/@id, '')[1],
        (./price, '')[1],
        (./publish_date, '')[1],
        (./description, '')[1]
      ),
      ','
    )
  ),
  '&#10;'
)

You can also abstract that functionality out into its own function:

declare function local:non-empty($possibly-empty) {
  ($possibly-empty, '')[1]
};

string-join(
  (
    'book_id,book_price,book_pub_date,book_desc',
    //book/book_details/string-join(
      (
        local:non-empty(ancestor::book/@id),
        local:non-empty(./price),
        local:non-empty(./publish_date),
        local:non-empty(./description)
      ),
      ','
    )
  ),
  '&#10;'
)