0
votes

I have a complicated data structure that I managed to flatten and the output has the following structure:

    'name'
    ------
    ['a','b','c']
    []
    [null]
    null
    ['f']
    [null,'d']

The desired output after filtering the above data frame:

'name'
------
['a','b','c']
['f']

I know that row that have 'null' only can be filtered by using df.where(col('name').isNotNull()). I tried using

filtered = udf(lambda row: int(not all(x is None for x in row)),IntegerType())

but that didn't produce the results I was hoping for. How do I filter rows that are empty list or contain at least one null?

1

1 Answers

0
votes

the below filtered function can be used as your udf

filtered = lambda x: not bool([y for y in x if y is None]) if x else False

>>> filtered(['a','b','c'])
True
>>> filtered([])
False
>>> filtered([None])
False
>>> filtered(None)
False
>>> filtered(['f'])
True
>>> filtered([None,'d'])
False