We have an online store where we use Solr for searching products. The basic setup works fine, but currently it's lacking some features. I looked up some online shops like Amazon, and I liked the features they are offering. So I thought, how could I configure Solr to offer some of the features to our end users.
Our product data consists of kinda standard data for products like
- title of a product
- description
- a product is in multiple categories and sub-categories
- a product can have multiple variants with options, like a T-Shirt in red, blue, green, S, M, L, XL... or an iPad with 16GB, 32GB...
- a product has a brand
- a product has a retailer
For now, we are using this schema file to index and perform queries on Solr:
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory" catenateWords="1" catenateAll="1" preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory" catenateWords="1" catenateAll="1" preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="true"/>
</analyzer>
</fieldType>
EdgeNGramFilterFactory
indexes a word likeshirt
intosh
,shi
,shir
,shirt
WordDelimiterFilterFactory
breaks up words likewi-fi
intowi
,fi
,wifi
PorterStemFilterFactory
works good for stemmingPhoneticFilterFactory
provides kinda fuzzy search
One problem is, that the fuzzy search doesn't work very well. If I search for the book Inferno
and missspelled it with Infenro
, the search doesn't return any results. I've read about the SpellCheckComponent
(http://wiki.apache.org/solr/SpellCheckComponent), but I'm not sure if that's the best way to do a fuzzy search, or a Did you mean? feature.
The second problem is, that it should be possible, to search for Shirts red
to find red T-Shirts (where red is an option value of the option type color) or to search for woman shoes
or adidas shoes woman
. Is it possible to do this with Solr?
And the third problem is, that I'm not sure which of the tokenizer and filters inside the schema.xml
are a good choice to achieve such features.
I hope someone has used such features with solr, and can help me in this case. Thx!
EDIT
Here is some data, that we store inside Solr:
<doc>
<str name="id">572</str>
<arr name="taxons">
<str>cat1</str>
<str>cat1/cat2</str>
<str>cat1/cat2/cat3</str>
<str>cat1/cat4</str>
</arr>
<arr name="options">
<str>color_blue</str>
<str>color_red</str>
<str>size_39</str>
<str>size_40</str>
</arr>
<int name="count_on_hand">321</int>
<arr name="name_text">
<str>Riddle-Shirt Tech</str>
</arr>
<arr name="description_text">
<str>The Riddle Shirt Tech Men's Hoodie features signature details, along with ultra-lightweight fleece for optimum warmth.</str>
</arr>
<arr name="brand_text">
<str>Riddle</str>
</arr>
<arr name="retailer_text">
<str>Supershop</str>
</arr>
</doc>
I'm not sure if the options
key-value pairs are stored in a proper way, but that's the first approach I came up with.