EDIT: Added my current query to the end
I have a large database of human names and am using elastic search (via symfony2's FOSElasticaBundle and Elastica) to do smarter searching of the names.
I have a full name field, and I want to index the people's names with standard, ngram, and phonetic analyzers.
I've got the analyzers set up in elastic search, and I can begin dumping data into the index. I'm wondering if the way I'm doing it here is the best way, or if I can apply the analyzers to a single field...the reason I ask is because when I do a get /website/person/:id, I see all three fields in plain text...I was expecting to see the analyzed data here, although I guess it must only exist in an inverted index rather than on the document. Examples I've seen use multiple fields, but is it possible to add multiple analyzers to a single field?
My config.yml:
fos_elastica:
clients:
default: { host: %elastica_host%, port: %elastica_port% }
indexes:
website:
settings:
index:
analysis:
analyzer:
phonetic_analyzer:
type: "custom"
tokenizer: "lowercase"
filter: ["name_metaphone", "lowercase", "standard"]
ngram_analyzer:
type: "custom"
tokenizer: "lowercase"
filter : [ "name_ngram" ]
filter:
name_metaphone:
encoder: "metaphone"
replace: false
type: "phonetic"
name_ngram:
type: "nGram"
min_gram: 2
max_gram: 4
client: default
finder: ~
types:
person:
mappings:
name: ~
nameNGram:
analyzer: ngram_analyzer
namePhonetic:
analyzer: phonetic_analyzer
When I check the mapping it looks good:
{
"website" : {
"mappings" : {
"person" : {
"_meta" : {
"model" : "acme\\websiteBundle\\Entity\\Person"
},
"properties" : {
"name" : {
"type" : "string",
"store" : true
},
"nameNGram" : {
"type" : "string",
"store" : true,
"analyzer" : "ngram_analyzer"
},
"namePhonetic" : {
"type" : "string",
"store" : true,
"analyzer" : "phonetic_analyzer"
}
}
}
}
}
}
When I GET the document, I see that all three fields are stored in plain text... maybe i need to set STORE: FALSE for these extra fields, or, is it not being analyzed properly?
{
"_index" : "website",
"_type" : "person",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source":{
"name":"John Doe",
"namePhonetic":"John Doe",
"nameNGram":"John Doe"
}
}
EDIT: The solution I'm currently using, which still requires some refinement but tests well for most names
//Create the query object
$boolQuery = new \Elastica\Query\Bool();
//Boost exact name matches
$exactMatchQuery = new \Elastica\Query\Match();
$exactMatchQuery->setFieldParam('name', 'query', $name);
$exactMatchQuery->setFieldParam('name', 'boost', 10);
$boolQuery->addShould($exactMatchQuery);
//Create a basic Levenshtein distance query
$levenshteinMatchQuery = new \Elastica\Query\Match();
$levenshteinMatchQuery->setFieldParam('name', 'query', $name);
$levenshteinMatchQuery->setFieldParam('name', 'fuzziness', 1);
$boolQuery->addShould($levenshteinMatchQuery);
//Create a phonetic query, seeing if the name SOUNDS LIKE the name that was searched
$phoneticMatchQuery = new \Elastica\Query\Match();
$phoneticMatchQuery->setFieldParam('namePhonetic', 'query', $name);
$boolQuery->addShould($phoneticMatchQuery);
//Create an NGRAM query
$nGramMatchQuery = new \Elastica\Query\Match();
$nGramMatchQuery->setFieldParam('nameNGram', 'query', $name);
$nGramMatchQuery->setFieldParam('nameNGram', 'boost', 2);
$boolQuery->addMust($nGramMatchQuery);
return $boolQuery;