0
votes

Consider I have the following data structure in a pyspark dataframe:

arr1:array
   element:struct
     string1:string
     arr2:array
         element:string
     string2: string

How can I remove the arr2 from my dataframe?

1
use to_json + from_json, see one similar post: stackoverflow.com/questions/58243292jxc

1 Answers

0
votes

You can use the drop function only. The way to select the nested columns is with .

Like window.start and window.end. You can access your arr2 as arr1.element.arr2.

df.drop(df.element.arr2)