0
votes

I have a array of tuple and I want to generate a join condition(OR) using that.

e.g.

input -->  [("leftId", "rightId"), ("leftId", leftAltId")] 
output -->  leftDF("leftId") === rightDF("rightId") || leftDF("leftAltId") === rightDF("rightAltId")

method signature:

  def inner(leftDF: DataFrame, rightDF: DataFrame, fieldsToJoin: Array[(String,String)]): Unit = {

  }

I tried using reduce operation on the array but output of my reduce operation is Column and not String hence it can't be fed back as input. I could do recursive but hoping there's simpler way to initiate empty column variable and build the query. thoughts ?

1

1 Answers

2
votes

You can do something like this:

val cond = fieldsToJoin.map(x => col(x._1) === col(x._2)).reduce(_ || _)
leftDF.join(rightDF, cond)

Basically you first turn the array into an array of conditions (col transforms the string to column and then === does the comparison) and then the reduce adds the "or" between them. The result is a column you can use.