0
votes

i would like to know how can i search all my documents that have a string field which contains a word.

I was looking on a solution that uses wildcard with * before and after the word. but its not good, since it also retrieve a documents that contains bigger word that contains that string. https://www.elastic.co/guide/en/elasticsearch/guide/current/_wildcard_and_regexp_queries.html i.e if i search for a "news" result can contains "Wikinews" which is not what i wanted.

My index is defined like that :

PUT /index
{
   "mappings" : {
          "text" : {
             "properties" : {
                "text" : { "type" : "string", "index" : "not_analyzed" },
                "url" : { "type" : "string"}
             }
          }
   }
}

I would like to search for documents that a given word would be on the 'text' field EDIT : example Data :

 curl -XPUT 'http://localhost:9200/index/type/1' -d '
{ 
   "url": "wikipedia.com", 
   "Text": "in the news", 

}'

 curl -XPUT 'http://localhost:9200/index/type/2' -d '
{ 
   "url": "wikipedia.com", 
   "Text": "Click here for Wikinews", 

}'

 curl -XPUT 'http://localhost:9200/index/type/3' -d '
{ 
   "url": "wikipedia.com", 
   "Text": "news for each page are those:", 

}'


curl -XPUT 'http://localhost:9200/index/type/4' -d '
{ 
   "url": "wikipedia.com", 
   "Text": "What are the news means to you", 

}'

curl -XPUT 'http://localhost:9200/index/type/5' -d '
{ 
   "url": "walla.com", 
   "Text": "today News are more ...", 

}'

This should return documents 1,3,4,5 document 5 because search is not case sensitive. document 2 not included because it's not the word news it's part of bigger word which is not relevant

Thanks for helpers

1
What does a sample of your data set look like? I assume the word news isn't in the field text alone.Robert
could you please provide more information regarding kind of query you would like to perform, the results you want and results you want to avoid?ChintanShah25

1 Answers

2
votes

First you need to remove "index" : "not_analyzed" because you need case insensitive search. "index" : "not_analyzed" will index the word as it is and you search for word "news" wont give you document 5.

{
   "mappings" : {
          "text" : {
             "properties" : {
                "text" : { "type" : "string"},
                "url" : { "type" : "string"}
             }
          }
   }
}

I am using default standard analyzer as I have not specified any analyzers. You can learn more about ElasticSearch Analysis Here.

After that a simple match query will be enough to get all desired documents.

{
  "query": {
    "match": {
      "text": "news"
    }
  }
}

You can replace match query with match_phrase query if you want phrase search.