1
votes

I have index with 17364 documents in elasticsearch.

$curl http://localhost:9200/performance/_count
{"count":17364,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0}}

Spring data repository,

public interface TestRepository extends ElasticsearchRepository<Transaction, String> {
}

Fetch all documents page by page and print:

public void testReport() {

  int page = 0, pageSize = 1000;
  Pageable of = PageRequest.of(page, pageSize);

  Page<Transaction> all = testRepository.findAll(of);
  int numberOfPages = all.getTotalPages();

  log.info("All pages: {},  {}", numberOfPages, all.getTotalElements());
  do {
     log.info("Current page: {}, {}", of.getPageNumber(), of.getPageSize());
     for (Transaction txn : all) {
        log.info(mapper.writeValueAsString(txn));
     }
  } while ((of = of.next()) != null && (transactionRepository.findAll(of)) != null);

}

This code is returning only 10000 documents although the index has 17364 documents. Could you please help me to find why this is happening.

  • ElasticSearch Version: 7.9
  • spring-boot-starter-parent: 2.3.2.RELEASE
1

1 Answers

1
votes

I see two options:

A. Since you have only 17364 documents, you could increase the index.max_result_window setting in your index to (e.g.) 20000, so that you can paginate till the end:

PUT performance/_settings
{
  "index.max_result_window": 20000
}

B. If you have a bigger index and/or increasing the index.max_result_window limit is not an option for any reason, then you need to leverage the Scroll API. Spring Data ES supports two ways for doing that.

The first method involves leveraging the ElasticsearchTemplate.searchForStream() method which internally uses the Scroll API

SearchHitsIterator<Transaction> stream = elasticsearchTemplate.searchForStream(searchQuery, Transaction.class, "performance");

The second method is a bit more low-level. You need to modify your repository definition with a method that returns a Stream:

public interface TestRepository extends ElasticsearchRepository<Transaction, String> {
    Stream<Transaction> findScrollAll();
}

And then implement that method with ElasticsearchTemplate. searchScrollStart() and ElasticsearchTemplate. searchScrollContinue()

Addition:

3rd option:

Just define a method

Stream<Searchhit<Transaction>> searchBy()

in your Testrepository. Or with just the return type Stream<Transaction>.