Tl;dr: How can I get Solr 4 to ignore diacritics when sorting facet values?
I've added the following four documents to the "collection1" Solr core in the default Solr example:
<doc>
<field name="id">1</field>
<field name="cat">manuka</field>
<field name="cat">mystery</field>
</doc>
<doc>
<field name="id">2</field>
<field name="cat">mānuka</field>
<field name="cat">stuff</field>
</doc>
<doc>
<field name="id">3</field>
<field name="cat">management</field>
<field name="cat">stuff</field>
</doc>
<doc>
<field name="id">4</field>
<field name="cat">abc</field>
<field name="cat">stuff</field>
</doc>
The "cat" field is defined as:
<field name="cat" type="string" indexed="true" stored="true" multiValued="true"/>
and the "string" type is defined as:
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
When I do a facet query on the "cat" field, sorted by value (http://localhost:8983/solr/collection1/select?q=*%3A*&rows=0&wt=json&indent=true&facet=true&facet.field=cat&facet.sort=index
), I get:
....
"facet_fields":{
"cat":[
"abc",1,
"management",1,
"manuka",1,
"mystery",1,
"mānuka",1,
"stuff",3]},
....
Note that mānuka comes after mystery. I'd like to have mānuka come after manuka and before stuff, that is, I'd like the sort to ignore diacritics including the macron.
If this was a non-facet search, it looks like I could achieve what I want by setting up Collation for a separate copy field and sort by that (I can't set up collation for the field itself because the stored data will be a binary representation of the collation key). However, it looks like this approach isn't possible for facet queries since they can only be sorted by index or count.
Am I overlooking something? Is there some trick to get this working in an environment where I do need to display the value of the "cat" field?