When Spark writes dateframe data to parquet file, Spark will create a directory which include several separate parquet files. Code for saving:
term_freq_df.write
.mode("overwrite")
.option("header", "true")
.parquet("dir/to/save/to")
I need to read data from this directory with pandas:
term_freq_df = pd.read_parquet("dir/to/save/to")
The error:
IsADirectoryError: [Errno 21] Is a directory:
How to resolve this problem with the simple method that the two code samples could use same path of files?