I have 160GB of data,partition on DATE Column and storing in parquet file format running on spark 1.6.0. I need to store the output parquet files with equal sized files in each partition with fixed size say like 100MB each.
I tried with below code:
val blockSize= 1024*1024*100
sc.hadoopConfiguration.setInt("dfs.blocksize", blockSize)
sc.hadoopConfiguration.setInt("parquet.block.size",blockSize)
df1.write.partitionBy("DATE").parquet("output_file_path")
The above configuration is not working, it is creating multiple files with default number of partitions,not the 100 MB file.