df1 = spark.createDataFrame([(1,[4,2]),(4,[3,2])], [ "col2","col4"])
+----+------+
|col2| col4|
+----+------+
| 1 |[4, 2]|
| 4|[3, 2]|
+----+------+
df = spark.createDataFrame([("a",1,10), ("a",2,20), ("a",3,30),
("b",4,40),("b",5,40),("b",1,40)], ["col1", "col2", "col3"])
+----+----+----+
|col1|col2|col3|
+----+----+----+
| a| 1| 10|
| a| 2| 20|
| a| 3| 30|
| b| 4| 40|
| b| 5| 40|
| b| 1| 40|
+----+----+----+
join df and df1 based on col2 and if its match then check col4 isin col2 group by col1. i am expecting the output, Can someone tell me how to self join in pyspark(check col4 isin col2 group by col1).
expected output
col1 col2 col3
a 1 10