I have a fixed Spark DataFrame order from the target table:
Target Spark Dataframe(col1 string , col2 int , col3 string , col4 double)
Now, if the source data comes in a jumbled order:
Source Spark Dataframe(col3 string , col2 int ,col4 double , col1 string).
How can I rearrange the source DataFrame to match the column order of the target DataFrame using PySpark?
The Source Spark Dataframe should be reordered like below to match the target DataFrame:
Output:
Updated Source Spark Dataframe(col1 string , col2 int , col3 string , col4 double)
Scenario 2:
Source Dataframe =[a,c,d,e]
Target dataframe =[a,b,c,d]
In this scenario, the source DataFrame should be rearranged to [a,b,c,d,e]
- Keep the order of the target columns
- Change the datatypes of the source column to match the target dataframe
- Add the new columns at the end
- If the target column is not present in the source columns, then the column should still be added but filled with
null
values.
In the above example, after the source DataFrame is rearranged, it would have a b
column added with null
values.
This will ensure that when we use saveAsTable
, the source DataFrame can easily be pushed into the table without breaking the existing table.