1
votes

I have a field with the following mapping defined :

"my_field": {
    "properties": {
        "address": {
            "type": "string",
            "analyzer": "email",
            "search_analyzer": "whitespace"
        }
    }
}

My email analyser looks like this:

{
    "analysis": {
        "filter": {
            "email_filter": {
                "type": "edge_ngram",
                "min_gram": "3",
                "max_gram": "255"
            }
        },
        "analyzer": {
            "email": {
                "type": "custom",
                "filter": [
                    "lowercase",
                    "email_filter",
                    "unique"
                ],
                "tokenizer": "uax_url_email"
            }
        }
    }
}

When I try to search for an email id, like [email protected]

Searching for terms like tes,test.xy etc. doesn't work. But if I search for test.xyz or [email protected], it works fine. I tried analyzing the tokens using my email filter and it works fine as expected

Ex. Hitting http://localhost:9200/my_index/_analyze?analyzer=email&[email protected]

I get:

{
    "tokens": [{
        "token": "tes",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test.",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test.x",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test.xy",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test.xyz",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test.xyz@",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test.xyz@e",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test.xyz@ex",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test.xyz@exa",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test.xyz@exam",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test.xyz@examp",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test.xyz@exampl",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test.xyz@example",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "test.xyz@example.",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "[email protected]",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "[email protected]",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }, {
        "token": "[email protected]",
        "start_offset": 0,
        "end_offset": 20,
        "type": "word",
        "position": 0
    }]
}

So I know that the tokenisation works. But while searching, it fails to search partial strings.

For ex. Looking for http://localhost:9200/my_index/my_field/_search?q=test, the result shows no hits.

Details of my index :

{
    "my_index": {
        "aliases": {
            "alias_default": {}
        },
        "mappings": {
            "my_field": {
                "properties": {
                    "address": {
                        "type": "string",
                        "analyzer": "email",
                        "search_analyzer": "whitespace"
                    },
                    "boost": {
                        "type": "long"
                    },
                    "createdat": {
                        "type": "date",
                        "format": "strict_date_optional_time||epoch_millis"
                    },
                    "instanceid": {
                        "type": "long"
                    },
                    "isdeleted": {
                        "type": "integer"
                    },
                    "object": {
                        "type": "string"
                    },
                    "objecthash": {
                        "type": "string"
                    },
                    "objectid": {
                        "type": "string"
                    },
                    "parent": {
                        "type": "short"
                    },
                    "parentid": {
                        "type": "integer"
                    },
                    "updatedat": {
                        "type": "date",
                        "format": "strict_date_optional_time||epoch_millis"
                    }
                }
            }
        },
        "settings": {
            "index": {
                "creation_date": "1480342980403",
                "number_of_replicas": "1",
                "max_result_window": "100000",
                "uuid": "OUuiTma8CA2VNtw9Og",
                "analysis": {
                    "filter": {
                        "email_filter": {
                            "type": "edge_ngram",
                            "min_gram": "3",
                            "max_gram": "255"
                        },
                        "autocomplete_filter": {
                            "type": "edge_ngram",
                            "min_gram": "3",
                            "max_gram": "20"
                        }
                    },
                    "analyzer": {
                        "autocomplete": {
                            "type": "custom",
                            "filter": [
                                "lowercase",
                                "autocomplete_filter"
                            ],
                            "tokenizer": "standard"
                        },
                        "email": {
                            "type": "custom",
                            "filter": [
                                "lowercase",
                                "email_filter",
                                "unique"
                            ],
                            "tokenizer": "uax_url_email"
                        }
                    }
                },
                "number_of_shards": "5",
                "version": {
                    "created": "2010099"
                }
            }
        },
        "warmers": {}
    }
}
1
for search you have "search_analyzer": "whitespace" analyzer. remove that and do the mappingbacktrack
@Backtrack I believe this is correct. Check stackoverflow.com/a/15932838/1465701. Unless I am missing something here, I think this should be the correct behaviour.nerandell
There's a typo in your mapping, analyser should read analyzerVal
@val, Eagle eyes :)backtrack
It should work if that's fixed. The index needs to be deleted, re-created and data re-indexed.Val

1 Answers

1
votes

Ok, everything looks correct, except your query.

You simply need to specify the address field in your query like this and it will work:

http://localhost:9200/my_index/my_field/_search?q=address:test

If you don't specify the address field, the query will work on the _all field whose search analyzer is the standard one by default, hence why you're not finding anything.