This question is in continuation of my previous this SO question. I've some text, on which I want to perform search both on numbers and text.
My Text:-
8080.foobar.getFooLabelFrombar(test.java:91)
And I want to search on getFooLabelFrombar
, fooBar
, 8080
and 91
.
Earlier I was using the simple
analyzer, which was tokenizing the above text into below tokens.
"tokens": [
{
"token": "foobar",
"start_offset": 10,
"end_offset": 16,
"type": "word",
"position": 2
},
{
"token": "getfoolabelfrombar",
"start_offset": 17,
"end_offset": 35,
"type": "word",
"position": 3
},
{
"token": "test",
"start_offset": 36,
"end_offset": 40,
"type": "word",
"position": 4
},
{
"token": "java",
"start_offset": 41,
"end_offset": 45,
"type": "word",
"position": 5
}
]
}
Beaucase of which, search on foobar
and getFooLabelFrombar
was giving the search result but not 8080
and 91
, as simple analyzer doesn't tokenize the numbers.
Then as suggested in prev. SO post, I changed the analyzer to Standard
, because of which numbers are searchable but not other 2 word search strings. As Standard analyzer would create below tokens :-
{
"tokens": [
{
"token": "8080",
"start_offset": 0,
"end_offset": 4,
"type": "<NUM>",
"position": 1
},
{
"token": "foobar.getfoolabelfrombar",
"start_offset": 5,
"end_offset": 35,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "test.java",
"start_offset": 36,
"end_offset": 45,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "91",
"start_offset": 46,
"end_offset": 48,
"type": "<NUM>",
"position": 4
}
]
}
I went to all the existing analyzers in ES, but nothing seems to fulfil my requirement. I tried creating my below custom analyzer but it doesn't work as well.
{
"analysis" : {
"analyzer" : {
"my_analyzer" : {
"tokenizer" : "letter"
"filter" : ["lowercase", "extract_numbers"]
}
},
"filter" : {
"extract_numbers" : {
"type" : "keep_types",
"types" : [ "<NUM>","<ALPHANUM>","word"]
}
}
}
}
Please suggest, How can I build my custom analyzer to suit my requirements.