0
votes

I have a special case for searching inside the elasticsearch. I also had a deep look into the documentation, tokenizer (n-gram, edge-ngram), queries etc. and the stackoverflow search, but without result.

Background I have a small index with some string fields (eq. name, street, city, email).

And a query like

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem. Nulla consequat My-Name quis My-Street. Donec pede justo, fringilla vel, aliquet nec, vulputate eget, arcu. In enim justo, rhoncus ut, imperdiet a, venenatis vitae, justo. Nullam dictum felis eu pede mollis pretium. Integer tincidunt. Cras dapibus. Vivamus elementum semper nisi. Aenean vulputate eleifend tellus. Aenean leo ligula, porttitor eu, con

What I want is to search with the values from the index inside the query.

So if I have an entry in the index with name = "My-Name" or street = "My-Street" this entry should be returned.

The closest post which I could found was Search ElasticSearch field contained in a value, but the tokenizer just split the values from the index, i need more like a substring-search in the query.

Thanks and best regards Simon

1

1 Answers

0
votes

Found a possible (but not high-performance) solution:

1.) Set an n-gram filter only for the search-analyzer. Index-analyzer is still "standard":

  • setting:

    "analysis": {
        "filter": {
            "desc_ngram": {
                "type": "ngram",
                "min_gram": 3,
                "max_gram": 50
            }
        },
        "analyzer": {
            "search_ngram": {
                "type": "custom",
                "tokenizer": "keyword",
                "filter": [ "desc_ngram", "lowercase" ]
            }
        }
    }
    
  • mapping

    "user": {
        "properties": {
            "street": {
                "type": "string",
                "analyzer": "standard",
                "search-analyzer": "search_ngram"
            }
        }...
    }
    

2.)Split the input-text into small blocks (about 47 chars)

String subtext = request.post.getText().substring(startIndex, offset);

3.) Fire-up an ordinary query to the elasticsearch for each block (Do this in an asynchronous manner)

        return CompletableFuture.supplyAsync(() -> {

        SearchRequestBuilder search = this.prepareSearch()
                .setQuery(QueryBuilders.queryStringQuery(textToAnalyze))
                .setSize(100);

        SearchResponse response = search.get();
        UserHit result = transformToHitFrom(response, UserHit.class);
        return result;
    }).exceptionally(e -> {
        logger.error("Error occurred while searching for user", e);
        UserHit result = new UserHit();
        return result;
    });

I did not perform a performance-test yet, but I hope that this solution has a better performance than the standard substring method. We will test this in the following days.