I have a Java ArrayList with few Integer values. I have created a DataSet with the ArrayList. I used System.out.println(DF.javaRDD().getNumPartitions()); and it resulted in 1 partition. I wanted to divide the data into 3 partitions. so I used repartition(). I want to find out the number of items in each partition after repartition.
In scala it is straight forward.
DF.repartition(3).mapPartitions((it) => Iterator(it.length));
But the same syntax is not working in Java since the length function is not available in Iterator Interface in Java.
How should we interpret mappartition function?
mapPartitions(FlatMapFunction<java.util.Iterator<T>,U> f)
What are the parameters that inner function will take and what is its return type?
SparkSession sessn = SparkSession.builder().appName("RDD to DF").master("local").getOrCreate();
List<Integer> lst = Arrays.asList(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20);
Dataset<Integer> DF = sessn.createDataset(lst, Encoders.INT());
System.out.println(DF.javaRDD().getNumPartitions());