0
votes

I have a dataframe df1 with a column col1 that has structure :

StructField(recipientResource,ArrayType(StructType(List(StructField(resourceId,StringType,true),StructField(type,StringType,true))),true),true)

and another dataframe df2 with col1 that has structure:

StructField(recipientResource,StructType(List(StructField(resourceId,StringType,true),StructField(type,StringType,true))),true)

Inorder to union df1.union(df2), I was trying to cast the column in df2 to convert it from StructType to ArrayType(StructType), however nothing which I tried has worked out.

Can anyone suggest how to go about the same. I'm new to pyspark, any help is appreciated.

1
array<struct<...>> and struct<...> are two completely different objects - you cannot cast one into another. You could add wrapping array if that's what you mean, like select(array(struct_column)).Alper t. Turker
An minimal reproducible example with a small sample of your dataframes and the desired output would be helpful. See more on how to create good reproducible apache spark dataframe examples.pault

1 Answers

0
votes

Here is a simple solution using array() function:

Input:

df1 (with ArrayType(StructType()) column):

enter image description here

df2 (with StructType() column):

enter image description here

Code:

df2=(df2
     .withColumn('recipientResource',array(col('recipientResource'))) #convert StructType() column to ArrayType(StructType()) column
    )

Output:

Modified df2:

enter image description here

df3 (output after union of df1 and df2):

enter image description here