If I read some data from an HBase (or MapR-DB) table with
JavaPairRDD<ImmutableBytesWritable, Result> usersRDD = sc.newAPIHadoopRDD(hbaseConf, TableInputFormat.class, ImmutableBytesWritable.class, Result.class);
the resulting RDD has 1 partition, as I can see calling usersRDD.partitions().size()
. Using something like usersRDD.repartition(10)
is not viable, as Spark complains because ImmutableBytesWritable is not serializable.
Is there a way to make Spark create a partitioned RDD from HBase data?