I want to be able to search some documents on Elasticsearch using some special character and term. For example if I have the following document:
"HEY YOU! Thanks for reading this post!"
I want to be able to use query string like:
{
"query": {
"query_string": {
"default_field": "content",
"query": "\"!\""
}
}
}
And having the previous document as the result. But I also want to be able to have the document by query:
{
"query": {
"query_string": {
"default_field": "content",
"query": "hey AND you"
}
}
}
I'm currently using the Standard tokenizer but I can't query special characters, it return me no documents. Is there a tokenizer already define for this kind of task? I considered not analysing the field but I wouldn't have the lowercase part.
EDIT:
I created a custom analyser:
{
"sw3": {
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "sw3",
"creation_date": "1493907201172",
"analysis": {
"analyzer": {
"my_analyzer": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "whitespace"
}
}
},
"number_of_replicas": "1",
"uuid": "e0_9cIFrQWqn-zqYeg0q5g",
"version": {
"created": "5030299"
}
}
}
}
}
But when I try:
{
"query": {
"query_string": {
"default_field": "content",
"query": ";"
}
}
}
I don't get any result. So I tried to do:
{
"query": {
"match": {
"content": ";"
}
}
}
but I still don't have any result. I try to see what exactly do the tokeniser:
GET /my_index/_analyze?analyzer=my_analyzer
{
"text": "Hey ; what's up"
}
And the result of the query is:
{
"tokens": [
{
"token": "hey",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 0
},
{
"token": ";",
"start_offset": 4,
"end_offset": 5,
"type": "word",
"position": 1
},
{
"token": "what's",
"start_offset": 6,
"end_offset": 12,
"type": "word",
"position": 2
},
{
"token": "up",
"start_offset": 13,
"end_offset": 15,
"type": "word",
"position": 3
}
]
}
Why can't I retrieve any documents when the tokeniser seem to work?