I have a complicated data structure that I managed to flatten and the output has the following structure:
'name'
------
['a','b','c']
[]
[null]
null
['f']
[null,'d']
The desired output after filtering the above data frame:
'name'
------
['a','b','c']
['f']
I know that row that have 'null' only can be filtered by using df.where(col('name').isNotNull())
. I tried using
filtered = udf(lambda row: int(not all(x is None for x in row)),IntegerType())
but that didn't produce the results I was hoping for. How do I filter rows that are empty list or contain at least one null?