What are the top industrial implementation approaches of file format to store data in HDFS for better performance and better utilization of the cluster?
Seems storing data in parquet file format gives good performance numbers as compared to the normal text file. Using parquet with snappy compression gives performance as well better utilization of cluster in terms of space as well.
So my question is whether to go with only parquet file format or to go with parquet plus snappy compression for storing data on HDFS. What are the industrial standard approaches and why? Any help is highly appreciated.