8
votes

I have to implement Solr index into Sitecore and I would like to know what is the best approach?

I looked at following approaches:

  1. Capture publish end event (or other events) and then push item to solr index
  2. Implement custom database crawler and get all changes from history table. Then using custom index push data to solr.

Second approach sounds like a way to go (in my opinion). In this case do I need to create a new search index, or search manager?

If anyone's done it before, can you point me into the right direction? Also if you could post some links to articles about sitecore-solr implementation.

UPDATE Ok, after reading sitecore documentation this is what I came up with :

  1. Create your custom SolrConfiguration class where you can set properties like solrserviceurl, add indexes and its definition (custom solr indexes)

  2. Create SolrIndex and add it (in the config file) to your SolrConfiguration. Which instantiating, solrindex should subscribe to AddEntry event of Sitecore History Manager, and communicate with solr crawlers.

  3. Create custom processor and hook into sitecore initialisation pipeline. Processor should initialize SolrConfiguration (from step 1)

  4. Since everything in your config file in will be build using refrection, you can get instance of your cofiguration based on your config file

How does that sound like. Can I have any comments please?

2

2 Answers

2
votes

We've done this on a few sites and tend to have a new "published" solr index and "unpublished" index

We interrupt:

OnItemSaving

Event to push things into the unpublished index (you may not need this, it depends if you want things in preview mode)

OnPublishItemProcessed

We process additions and updates to the published index here, I'm not sure what we do about deletions here without digging right into the code but certainly deal with deletions on the OnItemDelete (mentioned below)

OnItemDelete

We interrupt here to remove things from the published and non-published index (I think we remove from the published index here because Sitecore makes you publish the parent node in order to publish out deletions to the web database)

I hope that helps, I'd post the code if I could (but I'd be scowled at).

2
votes

In addition to the already posted answer (which I think is a good way to do things) I'll share how we do it.

We basically just took a look at the Sitecore database crawler and decided to do things kind of like how it was doing it.

We utilize a significantly modified version of the Custom Item Generator to facilitate mapping between strongly typed objects and an object that has properties that correspond to our Solr schema. For actual communication with Solr we use SolrNet.

The general idea is that we loop through all the items (starting with the site root) recursively and map them to the appropriate type based on its template. Then we go through an indexing process for that item (some items need to index multiple documents to Solr in our implementation).

This approach is working very well for us except I will note that because we are indexing everything at once, it tends to introduce a slight bit of lag time between publish and the site reflecting any changes made to the index. One oversight we made in the beginning but will be working to fix soon is that we don't have an "unpublished" index (meaning we need to publish the site to see updates). It doesn't impact our solution that much really, but I can definitely see where it would others, so keep that in mind.

We didn't particularly want to get into the deletion of items from the index so we do the indexing as a publish:end event.

I hope this additional insight helps you. As far as I know there's not a whole lot of information out there about this specific combination of products, but I can tell you it's definitely possible and quite useful.