I have two Spark Dataframes.
DataFrame A:
Col_A1 Col_A2
1 ["x", "y", "z"]
2 ["a", "x", "y"]
3 ["a", "b", "c"]
DataFrame B:
Col_B1
"x"
"a"
"y"
I want to check which entries of dataframe A has, say, "x" of Dataframe B in its Col_A2 and it return it as new dataframe itself. Repeatedly I want to do the same for rest of the entries of data frame B.
Output needs to be something like:
DataFrame A_x:
Col_A1 Col_A2
1 ["x", "y", "z"]
2 ["a", "x", "y"]
DataFrame A_a:
Col_A1 Col_A2
2 ["a", "x", "y"]
3 ["a", "b", "c"]
Dataframe A_y
Col_A1 Col_A2
1 ["x", "y", "z"]
2 ["a", "x", "y"]
I tried using udfs and map function, but didn't really get what I'm looking for. Thanks in advance.
collect()dataframe B, or is it so big that it would be prohibitive? - desertnaut