2
votes

I have some MARC21-XML documents about books. I want to extract the names of the translators of the book.

Here is a snippet from one MARC21-XML document of a book:

<?xml version="1.0" encoding="UTF-8"?>
  <record xmlns="http://www.loc.gov/MARC21/slim" type="Bibliographic">
    <datafield tag="700" ind1="1" ind2=" ">
      <subfield code="a">Wasel, Ulrike</subfield>
      <subfield code="4">trl</subfield>
    </datafield>
    <datafield tag="700" ind1="1" ind2=" ">
      <subfield code="a">Timmermann, Klaus</subfield>
      <subfield code="4">trl</subfield>
    </datafield>
    <datafield tag="700" ind1="1" ind2="2">
      <subfield code="a">Eggers, Dave</subfield>
    </datafield>
  </record>

Dave Eggers is the author of the book and Klaus Timmermann and Ulrike Wasel helped translating the book.

In this scenario the following "simple" XPath 2.0 expression would work to extract the "translators":

/record/datafield[@tag='700'][@ind1='1'][@ind2=' ']/subfield[@code='a']/text()

The result of this XPath 2.0 expression would be the following:

Text='Wasel, Ulrike'
Text='Timmermann, Klaus'

This seems to work nicely. However, I can think of a not-yet-discovered scenario in which there are additional elements with types other than translators (subfield[@code='a'] = 'trl'.

I would like to have the following selection logic implemented as XPath 2.0 but struggle to construct one:

  • /record/datafield attribute tag has value "700"
  • /record/datafield attribute ind1 has value "1"
  • /record/datafield attribute ind2 has value " "
  • /record/datafield contains subfield with attribute code equals "4" and its text() is "trl"

To mockup the scenario:

<?xml version="1.0" encoding="UTF-8"?>
  <record xmlns="http://www.loc.gov/MARC21/slim" type="Bibliographic">
    <datafield tag="700" ind1="1" ind2=" ">
      <subfield code="a">Wasel, Ulrike</subfield>
      <subfield code="4">trl</subfield>
    </datafield>
    <datafield tag="700" ind1="1" ind2=" ">
      <subfield code="a">Timmermann, Klaus</subfield>
      <subfield code="4">trl</subfield>
    </datafield>
    <datafield tag="700" ind1="1" ind2=" ">
      <subfield code="a">Doe, John</subfield>
      <subfield code="4">oth</subfield>
    </datafield>
    <datafield tag="700" ind1="1" ind2="2">
      <subfield code="a">Eggers, Dave</subfield>
    </datafield>
  </record>

In this scenario the following "simple" XPath 2.0 expression would work to extract the "translators":

/record/datafield[@tag='700'][@ind1='1'][@ind2=' ']/subfield[@code='a']/text()

The result of this XPath 2.0 expression would be the following:

Text='Wasel, Ulrike'
Text='Timmermann, Klaus'
Text='Doe, John'

And there is the error: John Doe is not a translator (trl) but some other (oth) contributor to the book. I do not want him ;)

I am not that familar to the MARC21-XML specification. The specifications about MARC21-XML which I have read are in a very strange tabular format that is hard to understand. It is possible that @ind1='1' and @ind2=' ' contains only translators but than the "type" field with "trl" makes no sense.

How to construct an XPath 2.0 expression that selects only the translators from the mockedup screnario?

1

1 Answers

2
votes

To further restrict this XPath,

/record/datafield[@tag='700'][@ind1='1'][@ind2=' ']
       /subfield[@code='a']/text()

to select only those datafield elements whose subfield child element with code of 4 has a string value of "trl", add another predicate, [subfield[@code='4']='trl']:

/record/datafield[@tag='700'][@ind1='1'][@ind2=' ']
                 [subfield[@code='4']='trl']
       /subfield[@code='a']/text()