0
votes

I want to write a Schematron rule which tests that the value within is either the current year, last year, or next year.

The problem is that the value could contain a year range, for instance: 2013-2014. I only want to test the first four digits of the text node of (2013).

This is what I wrote but it is not correct. Can you return the position of a text node?

XML file:

<article>
<front>
    <article-meta>
        <pub-date pub-type="epub-ppub">
            <year>2013-2014</year>
        </pub-date>
    </article-meta>
</front></article>

Schematron rule:

<pattern abstract="false" id="year">
    <rule context="/article/front/article-meta">
        <assert
            test="number(pub-date[@pub-type='epub-ppub']/year/text()[position() = 4]) = year-from-date(current-date()) or number(pub-date/year) = (year-from-date(current-date()) + 1) or number(pub-date/year) = (year-from-date(current-date()) - 1) or number(pub-date/year) = (year-from-date(current-date()) - 2)"
            >The publication year (pub-date[@pub-type='epub-ppub']/year) is "<value-of
                select="pub-date[@pub-type='epub-ppub']/year"/>". It should be the current year
                (<value-of select="year-from-date(current-date())"/>), last year, or next
            year.</assert>
    </rule>
</pattern>

When validating the XML file, the rule fires but 2013 is a valid year: The publication year (pub-date[@pub-type='epub-ppub']/year) is "2013-2014". It should be the current year (2013), last year, or next year.

2

2 Answers

3
votes

It seems you're using XPath 2.0 as XPath 1.0 does not support date functions.

Your mistake is that you cannot access substrings using sequences, use substring(from, length, start).

I stripped your problem down to finding a year element within that range and added some whitespace to make the answer less complex, I guess this will be easy for you to extend.

//year[
  substring(., 1, 4)[
    year-from-dateTime(current-dateTime()) - 1 <= .
    and year-from-dateTime(current-dateTime()) + 1 >= .
   ]
  ]
2
votes

You can use substring to only return a part of the text-content:

substring(//year/text(), 1, 4)

Alternatively substring-before to get content before a certain string or character:

substring-before(//year/text(), '-')

both return 2013 in for your example XML.