I am developing a dynamic script which can join
any given pyspark dataframes. The problem is the column names in file will vary & number of join conditions may vary. I can handle this in a loop but I execute the join with a variable name it fails.
(My intention is to dynamically populate a and b or more columns based on file structure and join conditions)
b="incrementalFile.Id1"
a="existingFile.Id"
unChangedRecords = existingFile.join(incrementalFile,(a==b),"left")
Traceback (most recent call last): File "", line 1, in File "/usr/lib/spark/python/pyspark/sql/dataframe.py", line 818, in join assert isinstance(on[0], Column), "on should be Column or list of Column" AssertionError: on should be Column or list of Column
But the same code works fine if I don't place any variables in join
condition as below.
unChangedRecords = existingFile.join(
incrementalFile,
(existingFile.Id==incrementalFile.Id1),
"left")