Is one parquet files under the parquet folder a partition?

Question

I saved my dataframe as parquet format

df.write.parquet('/my/path')

When checking on HDFS, I can see that there is 10 part-xxx.snappy.parquet files under the parquet directory /my/path

My question is: is one part-xxx.snappy.parquet file correspond to a partition of my dataframe ?

I am not sure if this question might be duplicated, please let me know if there is already similar question — super1ha1

notNull notNull · Accepted Answer · 2020-03-29T17:03:01

Yes, part-** files are created based on number of partitions in the dataframe while writing to HDFS.

To check number of partitions in the dataframe:

df.rdd.getNumPartitions()

To control number of files writing to filesystem we can use .repartition (or) .coalesce() (or) dynamically based on our requirement.