How to PartitionBy a column in spark and drop the same column before saving the dataframe in spark scala

Question

Let's say we have have a dataframe with columns as col1, col2, col3, col4. Now while saving the df I want to partition by using col2 and my final df which will be saved should not have col2. So the final df should be col1, col3, col4. Any advice about how should can I achieve this?

newdf.drop("Status").write.mode("overwrite").partitionBy("Status").csv("C:/Users/Documents/Test")

By default it will not save partitionBy column along with data., can you post your code ? — Srinivas

Srinivas Srinivas · Accepted Answer · 2020-05-19T07:44:46

drop will drop status column & Your code will fail with below error at partitionBy as status column was dropped.

org.apache.spark.sql.AnalysisException: Partition column `status` not found in schema [...]

Check below code, It will not include status values inside your data.

newdf
.write
.mode("overwrite")
.partitionBy("Status")
.csv("C:/Users/Documents/Test")

How to PartitionBy a column in spark and drop the same column before saving the dataframe in spark scala

1 Answers