Loading data to ignite through spark

Question

I am loading data from hdfs to ignite through spark. Raw data is around 5GB in parquet-snappy format (around 0.5 Bn rows).

I am using the spark-ignite api to load the data (https://apacheignite-fs.readme.io/docs/ignite-data-frame).

The ignite cluster is 3 nodes all running in server mode with 8GB durable memory, persistence enabled and WAL disabled.

While loading the data, it runs fast as long as it has space in durable memory. Once it is unable to fit in memory, the loading happens very slowly and gradually decreases.

I have tried some suggested configurations (GC tuning, On-Heap storage) but none improved the loading speed significantly.

As ignite memory-store doesn't compress the data it demands more storage (I have loaded 0.2 Bn rows which took almost 45 GB of space). I believe increasing the durable memory size shouldn't be the only solution.

Any suggestions or resources on where to start on tuning ignite cluster for better performance. Appreciate your time and help. Thank you.

dmagda dmagda · Accepted Answer · 2019-06-19T03:19:59

If RAM is a scarce resource then work on native persistence optimizations. This should be your bottleneck. Fine-tune it for your specific use case. Refer to the following pages:

Loading data to ignite through spark

2 Answers