2
votes

I'm trying to create parquet files for several days locally. The first time I run the code, everything works fine. The second time it fails to delete a file. The third time it fails to delete another file. It's totally random which file can not be deleted.

The reason I need this to work is because I want to create parquet files everyday for the last seven days. So the parquet files that are already there should be overwritten with the updated data.

I use Project SDK 1.8, Scala version 2.11.8 and Spark version 2.0.2.

After running that line of code the second time:

newDF.repartition(1).write.mode(SaveMode.Overwrite).parquet(
    OutputFilePath + "/day=" + DateOfData)

this error occurs:

WARN FileUtil: 
Failed to delete file or dir [C:\Users\...\day=2018-07-15\._SUCCESS.crc]: 
it still exists.
Exception in thread "main" java.io.IOException: 
Unable to clear output directory file:/C:/Users/.../day=2018-07-15 
prior to writing to it
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:91)

After the third time:

WARN FileUtil: Failed to delete file or dir 
[C:\Users\day=2018-07-20\part-r-00000-8d1a2bde-c39a-47b2-81bb-decdef8ea2f9.snappy.parquet]: it still exists.
Exception in thread "main" java.io.IOException: Unable to clear output directory file:/C:/Users/day=2018-07-20 prior to writing to it
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:91)

As you see it's another file than the second time running the code. And so on.. After deleting the files manually all parquet files can be created.

Does somebody know that issue and how to fix it?

Edit: It's always a crc-file that can't be deleted.

3
I think , You are trying to write in same file from which you are reading. - Manoj Kumar Dhakad
Hey, thanks for your answer. I don't think this is possible. I have: "pathInputFilePath = "C:\\Users\\IdeaProjects\\raw_data\\" "OutputFilePath = "C:\\Users\\IdeaProjects\\prepared_ssp_data\\" and I read the csv into a dataframe. I don't understand how it will read while writing? - Lisa

3 Answers

1
votes

Thanks for your answers. :) The solution is not to write in the Users directory. There seems to be a permission problem. So I created a new folder in the C: directory and it works perfect.

0
votes

Perhaps another Windows process has a lock on the file so it can't be deleted.

0
votes

this problem occurs when you open the destination directory in windows. You just need to close the directory.