I have two tables with the following schema as example:
scala> df1.printSchema
root
|-- id: string (nullable = true)
AND
scala> df2.printSchema
root
|-- col1: string (nullable = true)
|-- col2: array (nullable = true)
| |-- element: string (containsNull = true)
I want to get all col1 in df2 where an element in col2 array is equal to id in df1. Something such as df3 is output:
scala> df3.printSchema
root
|-- c1: array (nullable = true)
| |-- element: string (containsNull = true)
|-- c2: string (nullable = true)
where df3.c2 is basically df1.id and df3.c1 is array of all df2.col1 that satisfy the mentioned equality.
any SQL (hive) or Scala solution is very helpful.