3
votes

I am trying to make the below elasticsearch query to work with spring data. The intent is to return unique results for the field "serviceName". Just like a SELECT DISTINCT serviceName FROM table would do comparing to a SQL database.

{
  "aggregations": {
    "serviceNames": {
      "terms": {
        "field": "serviceName"
      }
    }
  },
  "size":0
}

I configured the field as a keyword and it made the query work perfectly in the index_name/_search api as per the response snippet below:

"aggregations": {
        "serviceNames": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "service1",
                    "doc_count": 20
                },
                {
                    "key": "service2",
                    "doc_count": 8
                },
                {
                    "key": "service3",
                    "doc_count": 8
                }
            ]
        }
    }

My problem is the same query doesn't work in Spring data when I try to run with a StringQuery I get the error below. I am guessing it uses a different api to run queries.

Cannot execute jest action , response code : 400 , error : {"root_cause":[{"type":"parsing_exception","reason":"no [query] registered for [aggregations]","line":2,"col":19}],"type":"parsing_exception","reason":"no [query] registered for [aggregations]","line":2,"col":19} , message : null

I have tried using the SearchQuery type to achieve the same results, no duplicates and no object loading, but I had no luck. The below sinnipet shows how I tried doing it.

final TermsAggregationBuilder aggregation = AggregationBuilders
                .terms("serviceName")
                .field("serviceName")
                .size(1);
        SearchQuery searchQuery = new NativeSearchQueryBuilder()
                .withIndices("index_name")
                  .withQuery(matchAllQuery())
                  .addAggregation(aggregation)
                  .withSearchType(SearchType.DFS_QUERY_THEN_FETCH)
                  .withSourceFilter(new FetchSourceFilter(new String[] {"serviceName"}, new String[] {""}))
                  .withPageable(PageRequest.of(0, 10000))
                  .build();

Would someone know how to achieve no object loading and object property distinct aggregation on spring data?

I tried many things without success to print queries on spring data, but I could not, maybe because I am using the com.github.vanroy.springdata.jest.JestElasticsearchTemplate implementation. I got the query parts with the below:

logger.info("query:" + searchQuery.getQuery());
logger.info("agregations:" + searchQuery.getAggregations());
logger.info("filter:" + searchQuery.getFilter());
logger.info("search type:" + searchQuery.getSearchType());

It prints:

query:{"match_all":{"boost":1.0}}
agregations:[{"serviceName":{"terms":{"field":"serviceName","size":1,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}]}}}]
filter:null
search type:DFS_QUERY_THEN_FETCH

1
Can you print out the query that is being generated by Spring data?Val
Thanks for reading @Val I tried adding the queries the way I could, if you have any tip for adding queries it's welcome, I could not with logging.level.org.springframework.data.elasticsearch.core=DEBUGlauksas
Does it also happen if you use the usual ElasticsearchTemplate class?Val
Can't do that because I am using elasticsearch 6.5, and spring data doesn't support it still. All other queries works fine. I think the key thing here is the "size":0. I have also tried running the query as a StringQuery it output me that error, maybe I should edit the questions stating that.lauksas
Your Pageable makes it so that size will not be 0, I'm afraid. If you're only interested in aggregations, you should not return any hits.Val

1 Answers

2
votes

I figured out, maybe can help someone. The aggregation don't come with the query results, but in a result for it self and is not mapped to any object. The Objects results that comes apparently are samples of the query elasticsearch did to run your aggregation (not sure, maybe). I ended up by creating a method which can do a simulation of what would be on the SQL SELECT DISTINCT your_column FROM your_table, but I think this will work only on keyword fields, they have a limitation of 256 characters if I am not wrong. I explained some lines in comments. Thanks @Val since I was only able to figure it out when debugged into Jest code and check the generated request and raw response.

public List<String> getDistinctField(String fieldName) {
    List<String> result = new ArrayList<>();

    try {
        final String distinctAggregationName = "distinct_field"; //name the aggregation

        final TermsAggregationBuilder aggregation = AggregationBuilders
                .terms(distinctAggregationName)
                .field(fieldName)
                .size(10000);//limits the number of aggregation list, mine can be huge, adjust yours

        SearchQuery searchQuery = new NativeSearchQueryBuilder()
                .withIndices("your_index")//maybe can be omitted
                .addAggregation(aggregation)
                .withSourceFilter(new FetchSourceFilter(new String[] { fieldName }, new String[] { "" }))//filter it to retrieve only the field we ar interested, probably we can take this out.
                .withPageable(PageRequest.of(0, 1))//can't be zero, and I don't want to load 10 results every time it runs, will always return one object since I found no "size":0 in query builder
                .build();
//had to use the JestResultsExtractor because com.github.vanroy.springdata.jest.JestElasticsearchTemplate don't have an implementation for ResultsExtractor, if you use Spring defaults, you can probably use it.
    final JestResultsExtractor<SearchResult> extractor = new JestResultsExtractor<SearchResult>() {
                @Override
                public SearchResult extract(SearchResult searchResult) {
                    return searchResult;
                }
            };

            final SearchResult searchResult = ((JestElasticsearchTemplate) elasticsearchOperations).query(searchQuery,
                    extractor);
            final MetricAggregation aggregations = searchResult.getAggregations();
            final TermsAggregation termsAggregation = aggregations.getTermsAggregation(distinctAggregationName);//this is where your aggregation results are, in "buckets".
            result = termsAggregation.getBuckets().parallelStream().map(TermsAggregation.Entry::getKey)
                    .collect(Collectors.toList());

        } catch (Exception e) {
            // threat your error here.
            e.printStackTrace();
        }
        return result;

    }