val df1 = Seq(("Brian", 29, "0-A-1234")).toDF("name", "age", "client-ID")
val df2 = Seq(("1234", 555-5555, "1234 anystreet")).toDF("office-ID", "BusinessNumber", "Address")
I'm trying to run a function on each row of a dataframe (in streaming). This function will contain a combination of scala code, and Spark dataframe api code. for example, I want to take the 3 features from df, and use them to filter a second dataframe called df2. My understanding is that a UDF can't accomplish this. Now I have all the filtering code working just fine, without the ability to apply it to each row of df.
My goal is to be able to do something like
df.select("ID","preferences").map(row => ( //filter df2 using row(0), row(1) and row(3) ))
The dataframes can't be joined, there is not a joinable relationship between them.
Although I'm using Scala, an answer in Java or Python would probably be fine.
I'm also fine with alternative ways of accomplishing this. If I could extract the data from the rows into separate variables (keep in mind this is streaming), that's also fine.