0
votes

Is there a way to modify the elements a sequence so only collated versions of the items are returned?

let $currencies := ('dollar', 'Dollar', 'dollar ')
return fn:collated-only($currencies, "http://marklogic.com/collation/en/S1/T00BB/AS")

=> ('dollar', 'dollar', 'dollar')
3

3 Answers

3
votes

The values that are stored in the range index (that feeds the facets) are literally the first value that was encountered that compared equal to the others. (Because, the collation says you don't care...)

You can get a long way by calling fn:replace(fn:lower-case(xdmp:diacritic-less(fn:normalize-unicode($str,"NFKC"))),"\p{P}","")

This won't be exactly the same in that it overfolds some things and underfolds others, but it may be good for your purposes.

2
votes

Is this the expected output? There is no fn:collated-only function, so I'm assuming you're asking how to write such a function or whether there is such a function.

The thing is, there isn't a mapping from one string to another in collation comparisons, there is only a comparison algorithm (the Unicode Collation Algorithm) so there really is no canonical kind of string to return to you, and therefore no API to do so.

Stepping back, what is the problem you are actually trying to solve? By the rules of that collation, "dollar" and "Dollar" are equivalent, and by using it you declare you don't care which form you use, so you could use either one.

1
votes

If these values are in XML elements and you have a range index using http://marklogic.com/collation/en/S1/T00BB/AS, you can do something like this:

let $ref := cts:element-reference(xs:QName("currency"), "collation=http://marklogic.com/collation/en/S1/T00BB/AS")
for $curr in cts:values($ref, (), "frequency-order")
return $curr || ": " || cts:frequency($curr)

This will produce results like:

"dollar: 15",
"euro: 12"

... and so on. The collation will disregard the differences among your sample inputs. These results could be formatted however you want. Is that what you're looking to do?