Say I have two dataframes with 4 columns each. The first 3 columns are string types, and the 4th column is an array type. I would like to concatenate these two dataframes so that the resulting dataframe will fulfill the following:
In rows where the first 3 columns' values are identical between the two dataframes, the row in the result dataframe will contain the identical values, and the array column will contain a union of all the values in each of the original dataframes' 4th column arrays.
Rows that don't have an 'identical' (just the first 3 columns) partner in the second dataframe, will appear as they are originally in the result dataframe.
Example:
DF1 = [
Row(str1="StringA", str2="StringB", str3="StringC", arr=["array_member_a"]),
Row(str1="String1", str2="String2", str3="String3", arr=["array_member_1"])]
DF2 = [
Row(str1="StringA", str2="StringB", str3="StringC", arr=["array_member_d"]),
Row(str1="String1", str2="String8", str3="String9", arr=["array_member_x"])]
reulst_DF = [
Row(str1="StringA", str2="StringB", str3="StringC", arr=["array_member_a", "array_member_d"]),
Row(str1="String1", str2="String2", str3="String3", arr=["array_member_1"]),
Row(str1="String1", str2="String8", str3="String9", arr=["array_member_x"])]