Consider the following situation, you have two RDDs of key-value pairs, where each of the two keys from the two RDDs are of different type.
RDD1(Vector[String], String) look like this:
(Vector("A", "B", "E"), "bla bla bla"),
(Vector("W"), "bla bla bla bla"),
(Vector("C", "M"), "bla bla bla bla bla"),
(Vector("A", "V"), "bla bla bla")
...
RDD2[(String, String)] look like this:
("A", 12),
("B", 434),
("C", 8023),
("D", 3454),
...
("N", 251)
Note: that keys in RDD2 are from A-N inclusive.
The desired output is pairs of the first RDD1 such that every string in the Vector key is a subset of the entire set of keys of RDD2
(Vector("A", "E", "B"), "bla bla bla"),
(Vector("C", "M"), "bla bla bla bla bla")
also if this is not possible with RDDs, I'd like to know how other abstractions like dataframe and dataset could achieve this result