0
votes

I'm new to Solr, and I don't know if this is the best way to do it:

I have some products, that are classified into several categories. The categories are organizied in a hierarchical structure like

- Electronics
  - Computer
    - Apple
      - iPads
      - Macbooks
    - Samsung
    - Notebooks
  - Photo
- Fashion
  - Women
  - Men
    - Shirts

Every product can have multiple categories. For example, a product could be in Electronics > Computer > Apple > Macbooks and Electronics > Computer > Notebooks. Listing products of Electronics should return all underlying products, including all subcategories. Listing products in Electronics > Computer should only return products from that subcategory.

My shop is in Rails and it uses sunspot as a DSL for Solr. In sunspot, I have a field called category_names, which has multiple: true and stored: true. In this field, I store multiple categories, from root to the deepest subcategory, that are stored in Solr like this:

<arr name="category_names_sms">
  <str>Electronics</str>
  <str>Electronics#Computer</str>
  <str>Electronics#Computer#Notebooks</str>
  <str>Electronics#Computer#Apple</str>
  <str>Electronics#Computer#Apple#Macbooks</str>
</arr>

When I want to retrieve all categories as a facet search, I just call Solr with facet=true&facet.field=category_names, and it returns sth like

<lst name="facet_counts">
  <lst name="facet_queries"/>
  <lst name="facet_fields">
    <lst name="taxon_names_sms">
      <int name="Electronics">2831</int>
      <int name="Electronics#Computer">1988</int>
      <int name="Electronics#Computer#Apple">543</int>
      ...
    </lst
  </lst
</lst>

When I want to only fetch products from a certain category, I'm calling Solr with fq=category_names:Electronics and it returns all the products from that category. And because every product also contains the path to the root category, I also get products from the subcategories.

I've read some articles about pivot faceting, hierarchical faceting... and I'm a little bit confused, if I use the Solr features right. My questions are:

  • Is this approach a good one? Or are there any drawbacks you can imagine? I'm using the # hashtag to split and parse the categories on the client side, and that's a point I don't like.
  • Another problem is, that when fetching the categories from Solr, I only have the name of the categories. But I also need the ID or the permalink to the category. Is there a way to store such information in Solr? I don't want to hit the database for this information.
  • Is there a better, maybe build in solution from Solr, that handles this whole hierarchical category thing better?
  • I only use the default solr XML configs from sunspot right now. I've read about defining fields and stuff like that. Can someone explain me, how to use it with sunspot?

Thanks a lot, I hope someone can push me into the right direction.

2

2 Answers

0
votes

I can see the structure you have is quite complex, I will suggest you not to go that way with Solr.

although Solr 4.0+ can do a limited join functionality, that is not his strong point. have a look at this article (expecially the part "Hiearchy and Relations makes Solr sad"): http://bibwild.wordpress.com/2011/01/24/thinking-like-solr-its-not-an-rdbms/

and this one for a help on how to denormalize your database to work best in Solr: http://mysolr.com/tips/denormalized-data-structure/

0
votes
  1. I also don't like that solution.

  2. What will you do when cattegory name is changed? You'll have to reindex all products in that category. I think it is better way to do one db query.

  3. Solr has support of pivot facets. So you can use it:

    If category's level is unlimited you should use dynamic field:

    <field name="categories" type="int" indexed="true" stored="true" multiValued="true"/>

    <dynamicField name="category_*" type="int" indexed="true" stored="true" multiValued="true"/>

    If you want to fetch products only from Electronics (for example it id is 20 and level is 1):

    fq=categories:20&fq={!tag=no_subcat}NOT category_2:[* TO *]

    And you can build facets for Electronic child and subchild categories:

    facet.pivot={!ex=no_subcat}category_2,category_3

  4. I've never used ruby.