4
votes

According to http://nest.azurewebsites.net/concepts/writing-queries.html, the && and || operators can be used to combine two queries using the NEST library to communicate with Elastic Search.

I have the following query set up:

var ssnQuery = Query<NameOnRecordDTO>.Match(
                q => q.OnField(f => f.SocialSecurityNumber).QueryString(nameOnRecord.SocialSecurityNumber).Fuzziness(0)
            );

which is then combined with a Bool query as shown below:

var result = client.Search<NameOnRecordDTO>(
     body => body.Query(
          query => query.Bool(
              bq => bq.Should(
                  q => q.Match(
                     p => p.OnField(f => f.Name.First)
                         .QueryString(nameOnRecord.Name.First).Fuzziness(fuzziness)
                  ),
                  q => q.Match(p => p.OnField(f => f.Name.Last)
                         .QueryString(nameOnRecord.Name.Last).Fuzziness(fuzziness)
                  )
              ).MinimumNumberShouldMatch(2)
          ) || ssnQuery
     )
);

What I think this query means is that if the SocialSecurityNumber matches, or both the Name.First and Name.Last fields match, then the record should be included in the results.

When I execute this query with the follow data for the nameOnRecord object used in the calls to QueryString:

"socialSecurityNumber":"123456789",
    "name" : {
      "first":"ryan",          
    }

the results are the person with SSN 123456789, along with anyone with first name ryan.

If I remove the || ssnQuery from the query above, I get everyone whose first name is 'ryan'.

With the || ssnQuery in place and the following query:

{
    "socialSecurityNumber":"123456789",
    "name" : {
      "first":"ryan",
      "last": "smith"
    }        
}

I appear to get the person with SSN 123456789 along with people whose first name is 'ryan' or last name is 'smith'.

So it does not appear that adding || ssnQuery is having the effect that I expected, and I don't know why.

Here is the definition of the index on object in question:

"nameonrecord" : {
    "properties": {      
        "name": {
            "properties": {
                "name.first": {
                    "type": "string"
                 },
                 "name.last": {
                    "type": "string"
                 }
             }   
        },
        "address" : {
            "properties": {
                "address.address1": {
                    "type": "string",
                     "index_analyzer": "address",
                     "search_analyzer": "address"
                 },
                "address.address2": {
                    "type": "string",
                    "analyzer": "address"
                 },
                 "address.city" : {
                    "type": "string", 
                    "analyzer": "standard"
                 },
                 "address.state" : {
                    "type": "string",
                    "analyzer": "standard"
                 },
                 "address.zip" : {
                    "type" : "string",
                    "analyzer": "standard"
                 }
            }   
        },                
        "otherName": {
           "type": "string"
        },
        "socialSecurityNumber" : {
           "type": "string"   
        },
        "contactInfo" : {
           "properties": {
                "contactInfo.phone": {
                    "type": "string"
                },
                "contactInfo.email": {
                    "type": "string"
                }
            }
        }                
     }   
}

I don't think the definition of the address analyzer is important, since the address fields are not being used in the query, but can include it if someone wants to see it.

1
This is a bug and it will be fixed in the next release, will post an answer with the details later. - Martijn Laarman

1 Answers

14
votes

This was in fact a bug in NEST

A precursor to how NEST helps translate boolean queries:

NEST allows you to use operator overloading to create verbose bool queries/filters easily i.e:

term && term will result in:

bool
    must
        term
        term

A naive implementation of this would rewrite

term && term && term to

bool
    must
        term
        bool
            must
                term
                term

As you can image this becomes unwieldy quite fast the more complex a query becomes NEST can spot these and join them together to become

bool
    must 
        term
        term
        term

Likewise term && term && term && !term simply becomes:

bool
    must 
        term
        term
        term
    must_not
        term

now if in the previous example you pass in a booleanquery directly like so

bool(must=term, term, term) && !term

it would still generate the same query. NEST will also do the same with should's when it sees that the boolean descriptors in play ONLY consist of should clauses. This is because the boolquery does not quite follow the same boolean logic you expect from a programming language.

To summarize the latter:

term || term || term

becomes

bool
    should
        term
        term
        term

but

term1 && (term2 || term3 || term4) will NOT become

bool
    must 
        term1
    should
        term2
        term3
        term4

This is because as soon as a boolean query has a must clause the should start acting as a boosting factor. So in the previous you could get back results that ONLY contain term1 this is clearly not what you want in the strict boolean sense of the input.

NEST therefor rewrites this query to

bool 
    must 
        term1
        bool
            should
                term2
                term3
                term4

Now where the bug came into play was that your situation you have this

bool(should=term1, term2, minimum_should_match=2) || term3 NEST identified both sides of the OR operation only contains should clauses and it would join them together which would give a different meaning to the minimum_should_match parameter of the first boolean query.

I just pushed a fix for this and this will be fixed in the next release 0.11.8.0

Thanks for catching this one!