0
votes

Let's say we have have a dataframe with columns as col1, col2, col3, col4. Now while saving the df I want to partition by using col2 and my final df which will be saved should not have col2. So the final df should be col1, col3, col4. Any advice about how should can I achieve this?

newdf.drop("Status").write.mode("overwrite").partitionBy("Status").csv("C:/Users/Documents/Test")
1
By default it will not save partitionBy column along with data., can you post your code ? - Srinivas
I have added the code - Nafis Aslam

1 Answers

0
votes

drop will drop status column & Your code will fail with below error at partitionBy as status column was dropped.

org.apache.spark.sql.AnalysisException: Partition column `status` not found in schema [...]

Check below code, It will not include status values inside your data.

newdf
.write
.mode("overwrite")
.partitionBy("Status")
.csv("C:/Users/Documents/Test")