To enable partial word searching
you must edit your local schema.xml file, usually under solr/config, to add either:
- NGramFilterFactory
- EdgeNGramFilterFactory
Here's what mine looks like: sample solr schema.xml
Here's the line to paste:
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
EdgeNGram
I went with the EdgeN option. It doesn't allow for searching in the middle of words, but it does allow partial word search starting from the beginning of the word. This cuts way down on false positives / matches you don't want, performs better, and is usually not missed by the users. Also, I like the minGramSize=2 so you must enter a minimum of 2 characters. Some folks set this to 3.
Once your local is setup and working, you must edit the schema.xml used by websolr, otherwise you will get the default behavior which requires the full-word to be entered even if you have full text searching configured for your models.
Take it to the next level
5 ways to speed up indexing
Special instructions for editing the websolr schema.xml if you are using Heroku
- Go to the Heroku online dashboard for your app
- Go to the resources tab, then click on the Websolr add-on
- Click the default link under Indexes
- Click on the Advanced Configuration link
- Paste in your schema.xml from your local, including the config for your Ngram tokenizer of choice (mentioned above). Save.
- Copy the link in the "Configure your Heroku application" box, then paste it into terminal to set your WEBSOLR_URL link in your heroku config.
- Click the Index Status link to get nifty stats and see if you are running fast or slow.
- Reindex everything
heroku run rake sunspot:reindex[5000]
- Don't use heroku run rake sunspot:solr:reindex - it is deprecated, accepts no parameters and is WAY slower
- Default batch size is 50, most people suggest using 1000, but I've seen significantly faster results (1000 rows per second as opposed to around 500 rps) by bumping it up to 5000+