I'm new to elastic search. I'm trying to fix our search so that it will allow users to search on content within html tags. Currently, we're using a whitespace tokenizer because we need it to return results on hyphenated names. Consequently, aname123-suffix project is indexed as ["aname123-suffix", "project"] and a user search for "aname123-*" returns the correct results.
My problem arises because we also want to be able to search on content within html tags. So, for example for a project called <aname123>-suffix project, we'd like to be able to enter the search term <aname123>-* and get back the correct results.
The index has the correct tokens for a whitespace tokenizer, namely ["<aname123>-suffix", "project"] but when my search string is "\<aname123\>\-suffix" or "\\<aname123\\>\\-suffix" elastic search returns no results.
I think the solution lies either in
- modifying the search string so that elastic search returns
<aname123>-suffixwhen I ask for it; or - being able to index the content within the tag separately from the whitespace tokens, i.e.
["<aname123>-suffix", "project", "aname123", "suffix"]
So far I've been approaching it by changing the indexing, but I have not yet succeeded. A standard tokenizer will allow search results for content within tags, but it fails to return search results for aname123-*. Currently my analyzer settings look like this:
{ "analysis":
{ "analyzer":
{ "my_whitespace_analyzer" :
{"type": "custom"
{"tokenizer": "whitespace},
{"filter": ["standard", "lowercase", "stop"]}
}
},
{ "my_tag_analyzer":
{"type": "custom"
{"tokenizer": "standard"},
{"filter": ["standard", "lowercase", "stop"]}
}
}
}
}
I can create a custom char filter that strips out the < and the >, so my index contains aname123; but for some reason elastic search still does not return correct results when searching on <aname123>*. However, when I use instead a standard analyzer, the index contains aname123 and it returns the expected results for <aname123>* ... What is so special about angle brackets in elastic search?