0
votes

I'm running a Spark Notebook to save a DataFrame as a Parquet File in the Bluemix Object Storage.

I want to overwrite the Parquet File, when rerunning the Notebook. But actually it's just appending the data.

Below a sample of the iPython Code:

df = sqlContext.sql("SELECT * FROM table")
df.write.parquet("swift://my-container.spark/simdata.parquet", mode="overwrite")
2

2 Answers

0
votes

I'm not the python guy,but SaveMode work for dataframe like this

df.write.mode(SaveMode.Overwrite).parquet("swift://my-container.spark/simdata.parquet")
0
votes

I think the blockstorage replace only the 'simdata.parquet' the 'PART-0000*' remains cuz was 'simdata.parquet' with the 'UUID' of app-id, when you try to read, the DF read all files with the 'simdata.parquet*'