I have a big pyspark dataframe (23M rows) with the following format:
names, sentiment
["Lily","Kerry","Mona"], 10
["Kerry", "Mona"], 2
["Mona"], 0
I would like to compute the average sentiment for each unique name in the names column, resulting into:
name, sentiment
"Lily", 10
"Kerry", 6
"Mona", 4