0
votes

EDIT: Added my current query to the end

I have a large database of human names and am using elastic search (via symfony2's FOSElasticaBundle and Elastica) to do smarter searching of the names.

I have a full name field, and I want to index the people's names with standard, ngram, and phonetic analyzers.

I've got the analyzers set up in elastic search, and I can begin dumping data into the index. I'm wondering if the way I'm doing it here is the best way, or if I can apply the analyzers to a single field...the reason I ask is because when I do a get /website/person/:id, I see all three fields in plain text...I was expecting to see the analyzed data here, although I guess it must only exist in an inverted index rather than on the document. Examples I've seen use multiple fields, but is it possible to add multiple analyzers to a single field?

My config.yml:

fos_elastica:
    clients:
        default: { host: %elastica_host%, port: %elastica_port% }
    indexes:
        website:
            settings:
                index:
                    analysis:
                        analyzer:
                            phonetic_analyzer:
                                type: "custom"
                                tokenizer: "lowercase"
                                filter: ["name_metaphone", "lowercase", "standard"]

                            ngram_analyzer:
                                type: "custom"
                                tokenizer: "lowercase"
                                filter   : [ "name_ngram" ]

                        filter:
                            name_metaphone:
                                encoder: "metaphone"
                                replace: false
                                type: "phonetic"

                            name_ngram:
                                type: "nGram"
                                min_gram: 2
                                max_gram: 4


            client: default
            finder: ~

            types:
                person:
                    mappings:
                        name: ~
                        nameNGram:
                            analyzer: ngram_analyzer
                        namePhonetic:
                            analyzer: phonetic_analyzer

When I check the mapping it looks good:

{
  "website" : {
    "mappings" : {
      "person" : {
        "_meta" : {
          "model" : "acme\\websiteBundle\\Entity\\Person"
        },
        "properties" : {
          "name" : {
            "type" : "string",
            "store" : true
          },
          "nameNGram" : {
            "type" : "string",
            "store" : true,
            "analyzer" : "ngram_analyzer"
          },
          "namePhonetic" : {
            "type" : "string",
            "store" : true,
            "analyzer" : "phonetic_analyzer"
          }
        }
      }
    }
  }
}

When I GET the document, I see that all three fields are stored in plain text... maybe i need to set STORE: FALSE for these extra fields, or, is it not being analyzed properly?

{
  "_index" : "website",
  "_type" : "person",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source":{
    "name":"John Doe",
    "namePhonetic":"John Doe",
    "nameNGram":"John Doe"
  }
}

EDIT: The solution I'm currently using, which still requires some refinement but tests well for most names

    //Create the query object
    $boolQuery = new \Elastica\Query\Bool();

    //Boost exact name matches
    $exactMatchQuery = new \Elastica\Query\Match();
    $exactMatchQuery->setFieldParam('name', 'query', $name);
    $exactMatchQuery->setFieldParam('name', 'boost', 10);
    $boolQuery->addShould($exactMatchQuery);

    //Create a basic Levenshtein distance query
    $levenshteinMatchQuery = new \Elastica\Query\Match();
    $levenshteinMatchQuery->setFieldParam('name', 'query', $name);
    $levenshteinMatchQuery->setFieldParam('name', 'fuzziness', 1);
    $boolQuery->addShould($levenshteinMatchQuery);

    //Create a phonetic query, seeing if the name SOUNDS LIKE the name that was searched
    $phoneticMatchQuery = new \Elastica\Query\Match();
    $phoneticMatchQuery->setFieldParam('namePhonetic', 'query', $name);
    $boolQuery->addShould($phoneticMatchQuery);

    //Create an NGRAM query
    $nGramMatchQuery = new \Elastica\Query\Match();
    $nGramMatchQuery->setFieldParam('nameNGram', 'query', $name);
    $nGramMatchQuery->setFieldParam('nameNGram', 'boost', 2);
    $boolQuery->addMust($nGramMatchQuery);

    return $boolQuery;
1
Im running into similar issue trying to search for human name in the best way. Also using ElasticSearchBundle. Would You mind show me your current query? Are you using Match Query with fuzziness or something along the line?mr1031011
I've added the solution I'm currently using to my question above...it isn't perfect...would love some more feedback, but this matches most names well, only a couple of outliers I haven't solved yet.Pez
Thank you, Im using something very similar to that, ngram and phonetic do help :)mr1031011

1 Answers

1
votes

No, you can't have multiple analyzers on a single field. The way you are doing is correct way of applying multiple analyzers by having different field names for same field.

The reason you are getting namePhonetic and nameNGram also in _source field is use of

"store" : true

It tells the ElasticSearch that you need those extra fields also in response. Use

"store" : false

that will solve your problem.

If you want to see the analyzed data on a field you can use _analyze api of elasticsearch.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-analyze.html

Yes, these fields are stored in inverted index after analysis.

I hope I have answered all your doubts. Please let me know if you need more help on this.

Thanks