I need a udf function to input array column of dataframe and perform equality check of two string elements in it. My dataframe has a schema like this.
ID | date | options |
---|---|---|
1 | 2021-01-06 | ['red', 'green'] |
2 | 2021-01-07 | ['Blue', 'Blue'] |
3 | 2021-01-08 | ['Blue', 'Yellow'] |
4 | 2021-01-09 | nan |
I have tried this :
def equality_check(options: list):
try:
if options[0] == options[1]:
return 1
else:
return 0
except:
return -1
equality_udf = f.udf(equality_check, t.IntegerType())
But it was throwing out of index error. I am confident that options column is array of strings. the expectation is this:
ID | date | options | equality_check |
---|---|---|---|
1 | 2021-01-06 | ['red', 'green'] | 0 |
2 | 2021-01-07 | ['Blue', 'Blue'] | 1 |
3 | 2021-01-08 | ['Blue', 'Yellow'] | 0 |
4 | 2021-01-09 | nan | -1 |