4
votes

Version Details: I am working with Sitecore 7.5 build 141003, using Solr v4.7 as the search engine/indexing server. I am also using the standard Sitecore Solr provider with no custom indexers.

Target Goal: I am using Sitecore ContentSearch LINQ with PredicateBuilder to compile some flexible and nested queries. Currently, I need to search within a specific "Root item", while excluding templates with "folder" in their name, also excluding items with "/testing" in their path. At some point the "Root item" could be more than one item, and so could the path contains (currently just "/testing". In those cases, the idea is to use PredicateBuilder to build an outer "AND" predicate with inner "OR"s for the multiple "Root item"s and path exclusions.

Problem: At the moment, I am dealing with an issue regarding the order of nesting and priorities for these predicates/conditions. I have been testing several approaches and combinations, but the issue I keep running into is the !TemplateName.Contains and Item["_fullpath"].Contains being prioritized over the Paths.Contains, which ends up resulting in 0 results each time.

I am using the Search.log to check the query output, and I have been manually testing against the Solr admin, running queries against it to compare results. Below, you will find examples of the combinations I have tried using Sitecore Linq, and the queries they produce for Solr.

Original Code Sample:

Original test with List for root items

// sometimes will be 1, sometimes will be multiple
var rootItems = new List<ID> { pathID };  // simplified to 1 item for now
var query = context.GetQueryable<SearchResultItem>();
var folderFilter = PredicateBuilder.True<SearchResultItem>().And(i => !i.TemplateName.Contains("folder") && !i["_fullpath"].Contains("/testing"));
var pathFilter = PredicateBuilder.False<SearchResultItem>();
pathFilter = rootItems.Aggregate(pathFilter, (current, id) => current.Or(i => i.Paths.Contains(id)));
folderFilter = folderFilter.And(pathFilter);
query.Filter(folderFilter).GetResults();

Query output: (-_templatename:(*folder*) AND -_fullpath:(*/testing*)) AND _path:(730c169987a44ca7a9ce294ad7151f13)

As you can see in the above output, there is an inner set of parenthesis around the two "not contains" filters which takes precedence over the Path one. When I run this exact query in the Solr admin, it returns 0 results. However, if I remove the inner parenthesis so it's all a single "AND" set, it returns the results expected.

I tested this further with different combinations and approaches to the PredicateBuilder, and each combination results in the same query. I even tried adding two individual filters ("query.Filter(pred1).Filter(pred2)") to my main query object, and it results in the same output.

Additional Code Samples:

Alt. 1 - Adding "Paths.Contains" to folder filter directly

var query = context.GetQueryable<SearchResultItem>();
var folderFilter = PredicateBuilder.True<SearchResultItem>().And(i => !i.TemplateName.Contains("folder") && !i["_fullpath"].Contains("/testing"));
folderFilter = folderFilter.And(i => i.Paths.Contains(pathID));
query.Filter(folderFilter).GetResults();

Query output: (-_templatename:(*folder*) AND -_fullpath:(*/testing*)) AND _path:(730c169987a44ca7a9ce294ad7151f13)

Alt 2 - Two predicates joined to first

var query = context.GetQueryable<SearchResultItem>();
var folderFilter = PredicateBuilder.True<SearchResultItem>().And(i => !i.TemplateName.Contains("folder") && !i["_fullpath"].Contains("/testing"));
var pathFilter = PredicateBuilder.False<SearchResultItem>().Or(i => i.Paths.Contains(pathID));
folderFilter = folderFilter.And(pathFilter);
query.Filter(folderFilter).GetResults();

Query output: (-_templatename:(*folder*) AND -_fullpath:(*/testing*)) AND _path:(730c169987a44ca7a9ce294ad7151f13)

Alt 3 - Two "inner" predicates, one for "Not"s and one for "Paths" joined to an outer predicate

var query = context.GetQueryable<SearchResultItem>();
var folderFilter = PredicateBuilder.True<SearchResultItem>().And(i => !i.TemplateName.Contains("folder") && !i["_fullpath"].Contains("/testing"));
var pathFilter = PredicateBuilder.False<SearchResultItem>().Or(i => i.Paths.Contains(pathID));
var finalPredicate = PredicateBuilder.True<SearchResultItem>().And(folderFilter).And(pathFilter);
query.Filter(finalPredicate).GetResults();

Query output: (-_templatename:(*folder*) AND -_fullpath:(*/testing*)) AND _path:(730c169987a44ca7a9ce294ad7151f13)

Conclusion: Ultimately, what I am looking for is a way to control the prioritization of these nested queries/conditions, or how I can build them to put the paths first, and the "Not" filters after. As mentioned, there are conditions where we will have multiple "Root items" and multiple path exclusions where I need to query something more like:

(-_templatename:(*folder*) AND -_fullpath:(*/testing*) AND (_path:(730c169987a44ca7a9ce294ad7151f13) OR _path:(12c1aa7f60fa4e8d9f0a983bbbb40d8b)))

OR

(-_templatename:(*folder*) AND -_fullpath:(*/testing*) AND (_path:(730c169987a44ca7a9ce294ad7151f13)))

Both of these queries return the results I expect/need when I run them directly in the Solr admin. However, I cannot seem to come up with an approach or order of operations using Sitecore ContentSearch Linq to output a query this way.

Does anyone else have experience with how I can accomplish this? Depending on the suggestion, I am also willing to assemble this piece of the query without Sitecore Linq, if I can marry it back to the IQueryable for calling "GetFacets" and "GetResults".

Update: I didn't include all the revisions I have done because SO would probably kill me for how long this would get. That said, I did try one other slight variation on my original example (top) with a similar result as the others:

var folderFilter = PredicateBuilder.True<SearchResultItem>().And(i => !i.TemplateName.Contains("folder")).And(i => !i["_fullpath"].Contains("/testing"));
var rootItems = new List<ID> { pathID, path2 };
// or paths separately
var pathFilter = PredicateBuilder.False<SearchResultItem>();
pathFilter = rootItems.Aggregate(pathFilter, (current, id) => current.Or(i => i.Paths.Contains(id)));   
var finalPredicate = folderFilter.And(pathFilter);
var query = context.GetQueryable<SearchResultItem>();
query.Filter(finalPredicate).GetResults();

Query Output: ((-_templatename:(*folder*) AND -_fullpath:(*/testing*)) AND (_path:(730c169987a44ca7a9ce294ad7151f13) OR _path:(12c1aa7f60fa4e8d9f0a983bbbb40d8b)))

And it's still those inner parenthesis around the "_templatename" and "_fullpath" conditions that causes problems.

Thanks.

3
Quick question : Why are you testing _fullpath and path ? What is the ID of the 'testing' node ? _path.Contains(<testing node GUID>) will return the same as '(star)/testing(star)' - _path is just an array of the full path made up of GUIDsStephen Pope
Indexfield name "_path" is the GUID, while indexfield name "_fullpath" is the actual string path to the item. There are instances where we have a hierarchy where several child folders contains the same folder structure: Item1 -> [Folder1, Folder2, Folder3], Item2 -> [Folder1, Folder2, Folder3] - We need everything in Item2 and Item3, except for stuff in Folder3, for instance.Daved
So "_path" or SearchResultItem.Paths.Contains(ID), lets me specify the parent that contains Item1 & Item2, while !SearchResultItem["_fullpath".Contains("Folder3") lets me exclude "Folder3" for any items from the results. Since we could have 50+ items containing a "Folder3" it allows an ""easy" (in theory) way to exclude Folder3 in each case.Daved

3 Answers

2
votes

Alright, I raised this question here and posted the situation to Sitecore support as well, and I just received a response and some additional information.

According to the Solr wiki (http://wiki.apache.org/solr/FAQ), in the "Searching" section, the question Why does 'foo AND -baz' match docs, but 'foo AND (-bar)' doesn't ? answers why the results are coming back 0.

Boolean queries must have at least one "positive" expression (ie; MUST or SHOULD) in order to match. Solr tries to help with this, and if asked to execute a BooleanQuery that does contains only negatived clauses at the topmost level, it adds a match all docs query (ie: :)

If the top level BoolenQuery contains somewhere inside of it a nested BooleanQuery which contains only negated clauses, that nested query will not be modified, and it (by definition) an't match any documents -- if it is required, that means the outer query will not match.

I am not sure of what entirely is being done to construct the query in the Sitecore Solr provider, or why they are grouping the negatives together in a nested query, but the nested query with negatives only is returning 0 results as expected, according to Solr doc. The trick, then, is to add a "match all" query (*:*) to the sub-query.

Instead of having to do this manually for any query that I think might encounter this situation, the support rep provided a patch DLL to replace the provider, that will automatically modify the nested query to remedy this.

They also logged this as a bug and provided reference number 398622 for the issue.

Now, the resulting query looks like this:

((-_templatename:(*folder*) AND -_fullpath:(*/testing*) AND *:*) AND _path:(730c169987a44ca7a9ce294ad7151f13))

or, for multiple queries:

((-_templatename:(*folder*) AND -_fullpath:(*/testing*) AND *:*) AND (_path:(730c169987a44ca7a9ce294ad7151f13) OR _path:(12c1aa7f60fa4e8d9f0a983bbbb40d8b)))

And the results return as expected. If anyone else comes across this, I would use the reference number with Sitecore support and see if they can provide the patch. You will also have to update the provider used in your Solr.Index and Solr.Indexes.Analytics config files.

0
votes

If the 2 working samples at the end are correct then you need to AND together the parts of your query separatly, instead of including 2 statements in a single call, which is what is causing the nesting of the initial part of your statement:

// the path part of the query. OR together all the locations
var pathFilter = PredicateBuilder.False<SearchResultItem>();
pathFilter = pathFilter.Or(i => i.Paths.Contains(pathID));
pathFilter = pathFilter.Or(i => i.Paths.Contains(pathID2));
...

// the exclusions, build them up seprately
var query = PredicateBuilder.True<SearchResultItem>();
query = query.And(i => !i.TemplateName.Contains("folder"));
query = query.And(i => !i["_fullpath"].Contains("/testing"));

// join both parts together
query = query.And(pathFilter);

This should give you (pseudo):

!templateName.Contains("folder") 
AND !_fullpath.Contains("/testing") 
AND (path.Contains(pathID1) || path.Contains(pathID2))

If you are trying to exclude certain templates then you could exclude them from your Index in the fisrt place by updating the ExcludeTemplate settings in Sitecore.ContentSearch.Solr.DefaultIndexConfiguration.config. You won't need to worry about specifically excluding it in query then:

<exclude hint="list:ExcludeTemplate">
  <MyTemplateId>{11111111-1111-1111-1111-111111111111}</MyTemplateId>
  <MyTemplateId>{22222222-2222-2222-2222-222222222222}</MyTemplateId>
</exclude>
0
votes

I have tried the following code and it did produce your needed output query, The trick was to use PredicateBuilder.True() when creating Path filter query, Not sure if that's a normal behavior from Content Search API, or its a bug

var query = context.GetQueryable<Sitecore.ContentSearch.SearchTypes.SearchResultItem>();
var folderFilter = PredicateBuilder.True<SearchResultItem>().And(i => !i.TemplateName.Contains("folder") && !i["_fullpath"].Contains("/testing"));
var pathFilter = PredicateBuilder.True<SearchResultItem>();
pathFilter = pathFilter.Or(i => i.Paths.Contains(Path1) || i.Paths.Contains(Path2));

folderFilter = folderFilter.And(pathFilter);