I'm using Spark 2.2.1.
I have a small DataFrame (less than 1M) and I have a computation on a big DataFrame that will need this small one to compute a column in an UDF.
What is the best option regarding performance
Is it better to broadcast this DF (I don't know if Spark will do the cartesian into memory).
.withColumn(udf("$colFromSmall", $"colFromBig"))
or to collect it and use the small
value directly in the udf
val small = smallDF.collect()