I'm new to Solr, and I don't know if this is the best way to do it:
I have some products, that are classified into several categories. The categories are organizied in a hierarchical structure like
- Electronics
- Computer
- Apple
- iPads
- Macbooks
- Samsung
- Notebooks
- Photo
- Fashion
- Women
- Men
- Shirts
Every product can have multiple categories. For example, a product could be in Electronics > Computer > Apple > Macbooks
and Electronics > Computer > Notebooks
. Listing products of Electronics
should return all underlying products, including all subcategories. Listing products in Electronics > Computer
should only return products from that subcategory.
My shop is in Rails and it uses sunspot as a DSL for Solr. In sunspot, I have a field called category_names
, which has multiple: true
and stored: true
. In this field, I store multiple categories, from root to the deepest subcategory, that are stored in Solr like this:
<arr name="category_names_sms">
<str>Electronics</str>
<str>Electronics#Computer</str>
<str>Electronics#Computer#Notebooks</str>
<str>Electronics#Computer#Apple</str>
<str>Electronics#Computer#Apple#Macbooks</str>
</arr>
When I want to retrieve all categories as a facet search, I just call Solr with facet=true&facet.field=category_names
, and it returns sth like
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="taxon_names_sms">
<int name="Electronics">2831</int>
<int name="Electronics#Computer">1988</int>
<int name="Electronics#Computer#Apple">543</int>
...
</lst
</lst
</lst>
When I want to only fetch products from a certain category, I'm calling Solr with fq=category_names:Electronics
and it returns all the products from that category. And because every product also contains the path to the root category, I also get products from the subcategories.
I've read some articles about pivot faceting, hierarchical faceting... and I'm a little bit confused, if I use the Solr features right. My questions are:
- Is this approach a good one? Or are there any drawbacks you can imagine? I'm using the
#
hashtag to split and parse the categories on the client side, and that's a point I don't like. - Another problem is, that when fetching the categories from Solr, I only have the name of the categories. But I also need the ID or the permalink to the category. Is there a way to store such information in Solr? I don't want to hit the database for this information.
- Is there a better, maybe build in solution from Solr, that handles this whole hierarchical category thing better?
- I only use the default solr XML configs from sunspot right now. I've read about defining fields and stuff like that. Can someone explain me, how to use it with sunspot?
Thanks a lot, I hope someone can push me into the right direction.