8
votes

I have an XML file that contains authors and editors.

<?xml version="1.0" encoding="UTF-8"?>
<?oxygen RNGSchema="file:textbook.rnc" type="compact"?>
<books xmlns="books">

    <book ISBN="i0321165810" publishername="OReilly">
        <title>XPath</title>
        <author>
            <name>
                <fname>Priscilla</fname>
                <lname>Walmsley</lname>
            </name>
        </author>
        <year>2007</year>
        <field>Databases</field>
    </book>

    <book ISBN="i0321165812" publishername="OReilly">
        <title>XQuery</title>
        <author>
           <name>
               <fname>Priscilla</fname>
               <lname>Walmsley</lname>
            </name>
        </author>
        <editor>
            <name>
                <fname>Lisa</fname>
                <lname>Williams</lname>
            </name>
        </editor>
        <year>2003</year>
        <field>Databases</field>
    </book>

    <publisher publishername="OReilly">
        <web-site>www.oreilly.com</web-site>
        <address>
            <street_address>hill park</street_address>
            <zip>90210</zip>
            <state>california</state>
        </address>
        <phone>400400400</phone>
        <e-mail>[email protected]</e-mail>
        <contact>
            <field>Databases</field>
            <name>
                <fname>Anna</fname>
                <lname>Smith</lname>
            </name>
        </contact>
    </publisher>
</books>

I'm looking for a way to return the person who has been listed the most times as an author and/or editor. The solution should be XQuery 1.0 (XPath 2.0) compatible.

I was thinking about using a FLWOR query to iterate through all authors and editors, then doing a count of unique authors/editors, then returning the author(s)/editor(s) that match the highest count. But I haven't been able to find the proper solution.

Does anyone have any suggestion as to how such a FLWOR query would be written? Could this be done in a simpler way, using XPath?

4

4 Answers

16
votes

This may help:

declare default element namespace 'books';
(for $name in distinct-values($doc/books/*/*/name)
 let $entries := $doc/books/*[data(*/name) = $name]
 order by count($entries) descending
 return $entries/*/name)[1]
7
votes

Here is a pure XPath 2.0 expression, admittedly not for the faint-hearted:

(for $m in max(for $n in distinct-values(/*/b:book/(b:author | b:editor)
                                        /b:name/concat(b:fname, '|', b:lname)),
               $cnt in count(/*/b:book/(b:author | b:editor)
                             /b:name[$n eq concat(b:fname, '|', b:lname) ])
               return $cnt
               ),
     $name in /*/b:book/(b:author | b:editor)/b:name,
     $fullName in $name/concat(b:fname, '|',  b:lname),
     $count in count( /*/b:book/(b:author | b:editor)
                   /b:name[$fullName eq concat(b:fname, '|',  b:lname)])
  return
     if($count eq $m)
       then $name
       else ()
   )[1]

where the prefix "b:" is associated with the namespace "books".

XSLT 2.0 - based verification:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:b="books">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/">
   <xsl:sequence select=
   "(for $m in max(for $n in distinct-values(/*/b:book/(b:author | b:editor)
                                            /b:name/concat(b:fname, '|', b:lname)),
                   $cnt in count(/*/b:book/(b:author | b:editor)
                                 /b:name[$n eq concat(b:fname, '|', b:lname) ])
                   return $cnt
                   ),
         $name in /*/b:book/(b:author | b:editor)/b:name,
         $fullName in $name/concat(b:fname, '|',  b:lname),
         $count in count( /*/b:book/(b:author | b:editor)
                       /b:name[$fullName eq concat(b:fname, '|',  b:lname)])
      return
         if($count eq $m)
           then $name
           else ()
       )[1]
   "/>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied on the provided XML document:

<books xmlns="books">
    <book ISBN="i0321165810" publishername="OReilly">
        <title>XPath</title>
        <author>
            <name>
                <fname>Priscilla</fname>
                <lname>Walmsley</lname>
            </name>
        </author>
        <year>2007</year>
        <field>Databases</field>
    </book>
    <book ISBN="i0321165812" publishername="OReilly">
        <title>XQuery</title>
        <author>
            <name>
                <fname>Priscilla</fname>
                <lname>Walmsley</lname>
            </name>
        </author>
        <editor>
            <name>
                <fname>Lisa</fname>
                <lname>Williams</lname>
            </name>
        </editor>
        <year>2003</year>
        <field>Databases</field>
    </book>
    <publisher publishername="OReilly">
        <web-site>www.oreilly.com</web-site>
        <address>
            <street_address>hill park</street_address>
            <zip>90210</zip>
            <state>california</state>
        </address>
        <phone>400400400</phone>
        <e-mail>[email protected]</e-mail>
        <contact>
            <field>Databases</field>
            <name>
                <fname>Anna</fname>
                <lname>Smith</lname>
            </name>
        </contact>
    </publisher>
</books>

the wanted, correct name element is selected and output:

<name xmlns="books">
   <fname>Priscilla</fname>
   <lname>Walmsley</lname>
</name>
4
votes

I've always felt this was an omission in XPath: the max() and min() functions return the highest/lowest value, whereas what you usually want is the object(s) in a collection that have the highest/lowest value for some expression. One solution is to sort the objects on that value and take the first/last from the list, which seems inelegant. Computing the min/max and then selecting the items whose value matches this seems equally unappealing. In Saxon there has long been a pair of higher-order extension functions saxon:highest() and saxon:lowest() which take a sequence and a function, and return the item(s) from the sequence having the lowest or highest values of the function result. The good news is that in XPath 3.0 you can write these functions yourself (in fact, they are given as example user-written functions in the spec).

2
votes

You are on the right track. The simplest way is to convert the names into strings (separated with a space, for example) and use these: (Note that the following code is untested)

let $names := (//editor | //author)/concat(fname, ' ', lname)
let $distinct-names := distinct-values($names)
let $name-count := for $name in $distinct-names return count($names[. = $name])
for $name at $pos in $distinct-names
where $name-count[$pos] = max($name-count)
return $name

Or, another approach:

(
  let $people := (//editor | //author)
  for $person in $people
  order by count($people[fname = $person/fname and
                         lname = $person/lname])
  return $person
)[last()]