3
votes

Case Introduction

My case is store some words in elasticsearch index, every word get its ID. My query data is some message. When there are some punctuation marks in the query data message, Elasticsearch will return a wrong answer.

Example:

For instance, I stored keywords "banana,apple,pen" in the index. I stored it using the bulk_index API

Query data1: "is this banana?"

The right result should be hits keyword "banana", but now it hits nothing.

Query data2: ">> it is a book"

The result should be hits nothing, but now it hits all the keywords in the index.

Without the punctuation the query result will work correctly.

Code:

My code for storeToIndex:(python, pyelasticsearch as the client)

es=ElasticSearch('http://localhost:9200/')
rval = es.bulk_index('%s'%index_name,'json',doc, id_field="id")

My code for queryIndex()

query={"query":{"query_string":{"query":"%s"%query_data}}}
 es=ElasticSearch('http://localhost:9200/')
 search_result=es.search(query=query,index=index_name,doc_type='json')

Question:

I can use regular express to solve it but is there any solution using elasticsearch setups? Something like filter or API, etc.?

Environment configuration:

Ubuntu 12.04 desktop 64 bit

Elasticsearch server in Ubuntu, version 0.90.7,single node

Client: pyelasticsearch

Programing language: python

API used: bulk_index API, search API

1
Welcome to Stack Overflow! It looks like you want us to write some code for you. While many users are willing to produce code for a coder in distress, they usually only help when the poster has already tried to solve the problem on their own. A good way to demonstrate this effort is to include the code you've written so far, example input (if there is any), the expected output, and the output you actually get (console output, stack traces, compiler errors - whatever is applicable). The more detail you provide, the more answers you are likely to receive. Check the FAQ and How to Ask - Inbar Rose
I have an idea about how to use python to solve the problem But in my case I have to solve it in elasticsearch. Any Idea about which API or filter or something elese I can Use? The API I use now is bulk index API, search API. - chengji18

1 Answers

3
votes

What you're running into is the query string parsing. banana? is interpreted as a term that's starting with banana and ending with a single unspecified character. This would match banana1 for example. And >> .... is creating an open-ended ranged query with, which is why it matches everything in your index.

I suggest you look into using a different query type, for example the match query that is designed for cases like this.

Take a look at this play with four queries (see the search tabs in the bottom left panel), exported here as Curl commands for convenience:

#!/bin/bash

export ELASTICSEARCH_ENDPOINT="http://localhost:9200"

# Create indexes

curl -XPUT "$ELASTICSEARCH_ENDPOINT/play" -d '{
    "settings": {}
}'


# Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"type"}}
{"somefield":"banana"}
{"index":{"_index":"play","_type":"type"}}
{"somefield":"apple"}
{"index":{"_index":"play","_type":"type"}}
{"somefield":"pen"}
'

# Do searches

curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
    "query": {
        "query_string": {
            "query": "is this banana?"
        }
    }
}
'

curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
    "query": {
        "match": {
            "somefield": {
                "query": "is this banana?"
            }
        }
    }
}
'

curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
    "query": {
        "query_string": {
            "query": ">> it is a book"
        }
    }
}
'

curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
    "query": {
        "match": {
            "somefield": {
                "query": ">> it is a book"
            }
        }
    }
}
'