I need some help with the following issue.
I have a pyspark dataframe with two columns:
| col_list | group|
|[1, 2, 3, 4, 5, 6, 7] |group1|
| [6, 7, 8] |group1|
| [1, 2, 3, 4] |group2|
| [10, 11] |group2|
And I want to do a groupby the column named group
and collect unique values only into one list from column col_list
I have tried this:
and it responded with this answer:
| group|flatten(collect_set(col_list)) |
|group1| [1,2,3,4,5,6,7,6,7,8]|
|group2| [10, 11, 1, 2, 3, 4] |
The group1 flatten list has duplicates and I need some help with only returning unique values like:
to remove duplicates from the flattened list – jxc