0
votes

I made a very simple test to figure out my mistake, but did not find it. I created two indexes and I'm trying to search documents in the ppa index that are similar to a given document in the ods index (like the second example here https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html).

These are my settings, mappings and documents for the ppa index:

PUT /ppa
{
  "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 0,
      "analysis": {
          "filter": {
              "brazilian_stop": {
                  "type": "stop",
                  "stopwords": "_brazilian_"
              },
              "brazilian_stemmer": {
                  "type": "stemmer",
                  "language": "brazilian"
              }
          },
          "analyzer": {
              "brazilian": {
                  "tokenizer": "standard",
                  "filter": [
                      "lowercase",
                      "brazilian_stop",
                      "brazilian_stemmer"
                  ]
              }
          }
      }
  }
}

PUT /ppa/_mapping/ppa
{"properties": {"descricao": {"type": "text", "analyzer": "brazilian"}}}

POST /_bulk
{"index":{"_index":"ppa","_type":"ppa"}}
{"descricao": "erradicar a pobreza"}
{"index":{"_index":"ppa","_type":"ppa"}}
{"descricao": "erradicar a pobreza"}

These are my settings, mappings and documents for the ods index:

PUT /ods
{
  "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 0,
      "analysis": {
          "filter": {
              "brazilian_stop": {
                  "type": "stop",
                  "stopwords": "_brazilian_"
              },
              "brazilian_stemmer": {
                  "type": "stemmer",
                  "language": "brazilian"
              }
          },
          "analyzer": {
              "brazilian": {
                  "tokenizer": "standard",
                  "filter": [
                      "lowercase",
                      "brazilian_stop",
                      "brazilian_stemmer"
                  ]
              }
          }
      }
  }
}

PUT /ods/_mapping/ods
{"properties": {"metaodsdescricao": {"type": "text", "analyzer": "brazilian"},"metaodsid": {"type": "integer"}}}

POST /_bulk
{"index":{"_index":"ods","_type":"ods", "_id" : "1" }}
{ "metaodsdescricao": "erradicar a pobreza","metaodsid": 1}
{"index":{"_index":"ods","_type":"ods", "_id" : "2" }}
{"metaodsdescricao": "crianças que vivem na pobreza", "metaodsid": 2}

Now, this search doesn't work:

GET /ppa/ppa/_search
{
    "query": {
        "more_like_this" : {
            "fields" : ["descricao"],
            "like" : [
            {
                "_index" : "ods",
                "_type" : "ods",
                "_id" : "1"
            }
            ],
            "min_term_freq" : 1,
            "min_doc_freq" : 1,
            "max_query_terms" : 20
        }
    }
}

But this one does work:

GET /ppa/ppa/_search
{
    "query": {
        "more_like_this" : {
            "fields" : ["descricao"],
            "like" : ["erradicar a pobreza"],
            "min_term_freq" : 1,
            "min_doc_freq" : 1,
            "max_query_terms" : 20
        }
    }
}

What is happening? Please, help me make this return something other than empty.

1
Your documents have field named "metaodsdescricao" but your query is using a field named "descricao"? Should it be *descricao?sramalingam24
I believe the names are all correct. The ppa mapping has a field called descricao, while the ods mapping has a field called metaodsdescricao. My query is searching on ppa in its descricao field. Thus, I believe that part is correct.Alex Pereira
My bad I didn't notice you had two indices. I think the fields have to match. You could try * or both fields to see if it will as a workaround but not suresramalingam24
You are right. If I make both fields with same name, the query returns the expected results. Is this requirement desired? I wonder how is it implemented.Alex Pereira
That's a question for ES guys :)sramalingam24

1 Answers

0
votes

The "more like this" query work well when you have indexed a lot of data. The empty result can be symptom of very few documents present in the elastic index.