4
votes

I'm working on a Spark Application (using Scala) and I have a List which contains multiple values. I'd like to use this list in order to write a where clause for my DataFrame and select only a subset on tuples. For example, my List contains 'value1', 'value2', and 'value3'. and I would like to write something like this:

mydf.where($"col1" === "value1" || $"col1" === "value2" || $"col1" === "value3)

How can I do that programmatically cause the list contains many values?

1
Just FYI, the example you are giving will return NULL since $"col1" can not be simultaneously three values? - Psidom
sorry I made a mistake, I meant || nor &&. - HHH

1 Answers

7
votes

You can map a list of values to a list of "filters" (with type Column), and reduce this list into a single filter by applying the || operator on every two filters:

val possibleValues = Seq("value1", "value2", "value3")
val result = mydf.where(possibleValues.map($"col1" === _).reduce(_ || _))