0
votes

New to the Elastic search, Below is what i am trying to achieve where anything that relatively matches with an query string should return the result

Creating sample Indexes like below

curl -XPUT 'http://localhost:9200/prj1/mod/java' -d '{
    "project_name": "Java_SE"
}'

curl -XPUT 'http://localhost:9200/prj1/mod/java2Ed' -d '{
    "project_name": "Java 2 Edition"
}'

curl -XPUT 'http://localhost:9200/prj1/mod/javaee' -d '{
    "project_name": "Java_EE"
}'

When searching

curl -XGET 'http://localhost:9200/prj1/mod/_search' -d '{"query" : {"match" : {"project_name" : "Java"}}}'

Returns below results

{"took":6,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.15342641,"hits":[{"_index":"prj1","_type":"mod","_id":"java2Ed","_score":0.15342641,"_source":{
"project_name": "Java 2 Edition"
}}]}}

It does not return all the Projects with the name like "Java_SE", "Java 2 Edition" and "Java_EE".

need to get everything where it finds with match text "example" I could be have data like this in my text as well

This is an example_code This example:11 is good Example you are looking for is not available.

What is that i m doing wrong here.

3

3 Answers

2
votes

You need to use edge ngram filter for this problem. Create your index with following setup

PUT prj1
{
  "settings": {
    "analysis": {
      "filter": {
        "ngram_filter": {
          "type": "edgeNGram",
          "min_gram": 2,
          "max_gram": 8
        }
      },
      "analyzer": {
        "relative": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "ngram_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "mod": {
      "properties": {
        "project_name": {
          "type": "string",
          "analyzer": "relative",
          "search_analyzer": "standard"
        }
      }
    }
  }
}

so java_se will have tokens ja, jav java etc and your match query will work.

Thanks to @sean, You would need to use ngram filter for words like complete_java_book, my_java_applet etc. You might want to consider breaking words on _ if you are using _ as naming convention for project names, are you?

Hope this helps.

0
votes

It's because, by default, elasticsearch will use the standard tokenizer which will not split text on the underscore "_" character. Therefore when you perform your search, you are searching for the "java" token which only the document java2ED has a token for.

0
votes

You can use query_string as well.

curl -XGET 'http://localhost:9200/prj1/mod/_search' -d '{"query" : {"query_string" : {"project_name" : "*Java*"}}}'