Question
Where can I find a complete example that shows how hierarchical faceted search works from indexing the documents to retrieving search results?
My research so far
Stackoverflow has a few posts, but all of them only address certain aspects of hierarchical faceted search; therefore, I wouldn't consider them to be duplicates. I'm looking for a complete example to understand it. I keep missing the last query where the aggregations work.
- This would be pretty much exactly what I am looking for, but again, not a complete walkthrough: Solr Hierarchical Faceting. Example needed
There is documentation on the Solr webpage, but didn't understand the example given there.
Example (conceptually)
I'd like to create a complete walkthrough example here and hope you can provide the missing final piece.
Testdata
Input
Let's say we have 3 documents with each document being a person.
Alice (document 1)
- Blond
- Europe
Jane (document 2)
- Brown
- Europe/Norway
Bob (document 3)
- Brown
- Europe/Norway
- Europe/Sweden
Output
The expected output for this (currently wrong) query
http://server:8983/solr/my_core/select?q=*%3A*&wt=json&indent=true&facet=true&facet.field=tags_ss
should be
Hair_color (3)
- blond (1)
- brown (1)
- black (1)
Location (3)
- Europe (4) // This should be 4 not 3, i.e. the sum of the leaves, because Alice is tagged with "Europe" only, without a country
- Norway (2)
- Sweden (1)
because all documents are found.
Example (programmatically)
This is where I require help. How do I implement the above conceptual example?
Here is how far I've gotten.
1. Create the test data XML
This is the content of the documents.xml
file in the solr-5.1.0/testdata
subfolder:
<add>
<doc>
<field name="id">Alice</field>
<field name="tags_ss">hair_color/blond</field>
<field name="tags_ss">location/Europe</field>
</doc>
<doc>
<field name="id">Jane</field>
<field name="tags_ss">hair_color/brown</field>
<field name="tags_ss">location/Europe/Norway</field>
</doc>
<doc>
<field name="id">Bob</field>
<field name="tags_ss">hair_color/black</field>
<field name="tags_ss">location/Europe/Norway</field>
<field name="tags_ss">location/Europe/Sweden</field>
</doc>
</add>
The _ss
is defined in schema.xml
as
<dynamicField name="*_ss" type="string" indexed="true" stored="true" multiValued="true"/>
Note that all tags, e.g. hair_color
and location
and anything tags that will be added in the future, are stored in the same tags_ss
field.
2. Index the test data with Solr
c:\solr-5.1.0>java -classpath dist/solr-core-5.1.0.jar -Dauto=yes -Dc=gettingstarted -Ddata=files -Drecursive=yes -Durl=http://server:8983/solr/my_core/update org.apache.solr.util.SimplePostTool .\testdata
3. Retrieve all data with a Solr query (without faceting)
Query
http://server:8983/solr/my_core/select?q=*%3A*&wt=json&indent=true
Result
{
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"indent": "true",
"q": "*:*",
"_": "1430830360536",
"wt": "json"
}
},
"response": {
"numFound": 3,
"start": 0,
"docs": [
{
"id": "Alice",
"tags_ss": [
"hair_color/blond",
"location/europe"
],
"_version_": 1500334369469890600
},
{
"id": "Jane",
"tags_ss": [
"hair_color/brown",
"location/europe/Norway"
],
"_version_": 1500334369469890600
},
{
"id": "Bob",
"tags_ss": [
"hair_color/black",
"location/europe/Norway",
"location/europe/Sweden"
],
"_version_": 1500334369469890600
}
]
}
}
4. Retrieve all data with a Solr query (with faceting)
Query
http://server:8983/solr/my_core/select?q=*%3A*&wt=json&indent=true&facet=true&facet.field=tags_ss
Result
{
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"facet": "true",
"indent": "true",
"q": "*:*",
"_": "1430830432389",
"facet.field": "tags_ss",
"wt": "json"
}
},
"response": {
"numFound": 3,
"start": 0,
"docs": [
{
"id": "Alice",
"tags_ss": [
"hair_color/blond",
"location/europe"
],
"_version_": 1500334369469890600
},
{
"id": "Jane",
"tags_ss": [
"hair_color/brown",
"location/europe/Norway"
],
"_version_": 1500334369469890600
},
{
"id": "Bob",
"tags_ss": [
"hair_color/black",
"location/europe/Norway",
"location/europe/Sweden"
],
"_version_": 1500334369469890600
}
]
},
"facet_counts": {
"facet_queries": {},
"facet_fields": {
"tags_ss": [
"location/europe/Norway",
2,
"hair_color/black",
1,
"hair_color/blond",
1,
"hair_color/brown",
1,
"location/europe",
1,
"location/europe/Sweden",
1
]
},
"facet_dates": {},
"facet_ranges": {},
"facet_intervals": {},
"facet_heatmaps": {}
}
}
Note this section at the bottom of the result:
"facet_fields": {
"tags_ss": [
"location/europe/Norway",
2,
"hair_color/black",
1,
"hair_color/blond",
1,
"hair_color/brown",
1,
"location/europe",
1,
"location/europe/Sweden",
1
]
},
It shows all tags as a flat list (not hierarchical).
5. Retrieve all data with a Solr query (with hierarchical faceting)
Query
Here is my problem. I don't know how to construct the query which returns the following result (the result already shown in the conceptual example above).
Result (fictitious, created by hand for illustration)
{
"responseHeader":{
"status":0,
"QTime":0,
"params":{
"facet":"true",
"indent":"true",
"q":"*:*",
"facet.field":"tags_ss",
"wt":"json",
"rows":"0"}},
"response":{"numFound":3,"start":0,"docs":[]
},
"facet_counts":{
"facet_queries":{},
"facet_fields":{
"tags_ss":[
"hair_color,3, // This aggregations is missing
"hair_color/black",1,
"hair_color/blond",1,
"hair_color/brown",1,
"location/europe",4, // This aggregation should be 4 but is 1
"location/europe/Norway",2,
"location/europe/Sweden",1]},
"facet_dates":{},
"facet_ranges":{},
"facet_intervals":{},
"facet_heatmaps":{}}}
This tags list is still flat, but at least location/europe = 4
would be correctly aggregated, but currently it is not. I keep getting location/europe = 1
because it's only set for Alice
and Bob
's Norway
and Sweden
are not aggregated to also count towards Europe
.
Ideas
- I might need to use
facet.pivot
, but I don't know how. - I might need to use
facet.prefix
, but I don't know how.
Versions
- Solr 5.1.0
- Windows 7