0
votes

I need to read more than a million records from a Cassandra Database using Spring Data Cassandra and write it into a file using Spring Batch. Right now I'm using pagination feature by Spring data Cassandra, but it seems to be executing select * from table and then filter the records. This might not be a good option as the table will have more than a million records and loading them all into memory at once would be bad.

I also need to integrate this with Spring Batch, so that I'm able to right every record into a file. I'm looking for a way to read data from Cassandra and save it in file in chunks. What would be the best way to do so?

Here's the code I'm using to fetch the records from Cassandra using pagination:

public void startJob() {
    Pageable pageable = PageRequest.of(0, pageSize);
    Slice<FLProductATPEntity> slice = repository.findAll(pageable);
    List<FLProductATPEntity> entityList;
    if (slice.hasContent()) {
        entityList = slice.getContent();
        entityList.forEach(entity -> log.info("Entity: {}", entity));           
    }
    while (slice.hasNext()) {
        slice = repository.findAll(slice.nextPageable());
        entityList = slice.getContent();
        entityList.forEach(entity -> log.info("Entity: {}", entity));
    }
}
1
I have no idea if Spring Data Cassandra has reactive support yet, but if so, this seems like a perfect use case. - chrylis -cautiouslyoptimistic-

1 Answers

0
votes

I'm looking for a way to read data from Cassandra and save it in file in chunks

Spring Batch provides the RepositoryItemReader which you can use with your cassandra PagingAndSortingRepository as delegate. So you can create a chunk-oriented step with this reader and a FlatFileItemWriter to write data to a file.