I have the following Spark DataFrames:
df1with columns(id, name, age)df2with columns(id, salary, city)df3with columns(name, dob)
I want to join all of these Spark data frames using Python. This is the SQL statement I need to replicate.
SQL:
select df1.*,df2.salary,df3.dob
from df1
left join df2 on df1.id=df2.id
left join df3 on df1.name=df3.name
I tried something that looks like below in Pyspark using python but I am receiving an error.
joined_df = df1.join(df2,df1.id=df2.id,'left')\
.join(df3,df1.name=df3.name)\
.select(df1.(*),df2(name),df3(dob)
My question: Can we join all the three DataFrames in one go and select the required columns?
df1.(*)is invalid syntax. - pault