3
votes

Sitecore.NET 6.6.0 (rev. 130404)

Our production website is very search-heavy and our Lucene indexes are queried heavily throughout the day. This amounts to considerable amount of CPU power being spent on Lucene query processing. Are there industry practices to offload Lucene indexes and queries to a different machine? or are there any hardware mechanisms that can be used to boost Lucene query performance?

(Our most used Lucene index contains less than 10,000 entries)

Update (more info):

Although our index contains less than 10,000, can the CPU usage be caused by high number of Lucene queries that get executed parallely? We have a very complex faceted search. Initially, when users try out various search criteria, we were displaying result-count-breakdowns alongside all the search options (resulting in 50-60 count queries with each search request). This caused the CPU usage reach 90-95% during high traffic. When we removed the counts, the CPU stabilized around 20-30%.

Here are the two methods we use for querying:

    public static Document[] GetLuceneDocuments(ACIndex acIndex, Query query, Sort sort = null, int maxResults = 999, bool trackScores = false, bool fillFields = true)
    {
    Index index = SearchManager.GetIndex(GetIndexName(acIndex));

    if (sort == null)
    {
        sort = new Sort(new SortField(null, SortField.SCORE));
    }

    using (IndexSearchContext searchContext = index.CreateSearchContext())
    {
        Lucene.Net.Search.IndexSearcher searcher = searchContext.Searcher;

        TopFieldCollector collector = TopFieldCollector.create(sort, maxResults, fillFields, trackScores, false, false);
        searcher.Search(query, collector);
        TopDocs topdocs = collector.TopDocs();

        Document[] documents = new Document[topdocs.ScoreDocs.Length];
        for (int i = 0; i < topdocs.ScoreDocs.Length; i++)
        {
            documents[i] = searcher.Doc(topdocs.ScoreDocs[i].doc);
        }

        return documents;
    }
    }

    public static int GetSearchResultCount(ACIndex acIndex, Query query)
    {
        Index index = SearchManager.GetIndex(GetIndexName(acIndex));

        using (IndexSearchContext searchContext = index.CreateSearchContext())
        {
            Lucene.Net.Search.IndexSearcher searcher = searchContext.Searcher;

            TopScoreDocCollector collector = TopScoreDocCollector.create(1, false);
            searcher.Search(query, collector);
            return collector.GetTotalHits();
        }
    }
5
Queries executed on Lucene index which contains less than 10,000 entries should not be high CPU consuming. Can you post the code of the most common queries and how you retrieve item from the query results?Marek Musielak
+1 for @MarasMusielak. But as an aside may I also put in a plug for ElasticSearch: blog.navigationarts.com/…Mark J Miller
You need to debug your code and see if there is a bottleneck, as @Maras Musielak said, 10,000 entries should not be high CPU consuming at allAhmed Okour
Thanks for all of your insights. I've updated my question with some more explanations.ravinsp

5 Answers

2
votes

You should look into implementing Solr for your searches. While not an expert on the subject, Solr is Lucene based (making the transition easier) and runs off a central server or servers, dealing with all your search requirements.

Solr isn't natively officially supported in versions prior to Sitecore 7 - but I have worked on a number of Sitecore 6 solutions that did use Solr.

This article should give you a lead start: How to implement Solr into Sitecore

As far as industry processes go, with Sitecore, Solr is the solution to this particular problem. Depending on your solution implementation however, it could take some doing to get up and going.

0
votes

You might look at www.alpha-solutions.dk/sitecore-search-solution for a Solr on Sitecore 6 approach. Note: I am affiliated with Alpha Solutions

0
votes

Your index is small, I know there are recommendations that you rearchitect the whole solution, however, I recommend something I have done in the past that has worked well for me and will not require that you provision another server or install another indexing tool like Elastic or SOLR.

First, store the fields in the index that you facet on, like below (either in configuration or using a custom crawler):

  • _group
  • _path
  • _creator
  • Manufacturer
  • Size
  • Year
  • ... [other fields]

Create a class that represents a result

    public class MyThing
    {
        public string Manufacturer { get; set; }
        public string Size { get; set; }
        public int Year { get; set; }
        public MyThing(Document doc)
        {
            Manufacturer = doc.GetField("Manufacturer").Value;
            Size = doc.GetField("Size").Value;
            Year = int.Parse(doc.GetField("Year").Value);
        }
    }

Then take your main search result hits, instantiate your lightweight POCO's, and do counts off of that. Voila, 1 query!

int countForSomething = results.Count(result=>result.Size == "XL");

NOTE: I kind of wrote this code off the top of my head, but you get the idea. I have used this process on indexes in Lucene up to 700K+ results in Sitecore without much issue. Good luck sir!

0
votes

Ah! Just tackled the issue of faceted search and CPU usage myself. This is some border-line black-magic coding and some really creative caching.

We found a way to implement Solr's faceted querying into Lucene, and boy oh boy are the results stunningly fast.

Short version:

  • Build a static class that holds onto a dictionary. Key: unique representation of an individual filter, Value: the BitArray produced by a Lucene QueryFilter object.

    var queryFilter = new QueryFilter(filterBooleanQuery); var bits = queryFilter.Bits(indexReader); result[filter.ID.ToString()] = bits

  • Build this dictionary periodically asynchronously in the background. My index of about 80k documents only took about 15 seconds to build, but that's enough to make a lot of users angry so doing it in a non-blocking manner is crucial.

  • Query this dictionary using bitwise logic to find the resulting BitArray representing the hits you're looking for.

    var combo = facetDictionary[thisFilter.ID.ToString()] .And(facetDictionary[selectedFilter.ID.ToString()]);

Long Version: http://www.devatwork.nl/articles/lucenenet/faceted-search-and-drill-down-lucenenet/

Now, our implementation was only to get the cardinality of these result sets, but theoretically you could use these bit arrays to get actual documents out of the index as well.

Good luck!

0
votes

Upgrading to sitecore 7 would give you the facets out of the box. Abstracted in a nice LINQ API that lets you switch from Lucene and SOLR (others, like ElasticSearch are coming)...