0
votes

I am a beginner to PYSPARK/SPARKSQL, and I have a requirement as below, I have a configuration table as below (DataFrame:Config), Config :
| Dataframe | Col1 | Col2 |Col3 |:---- |:------:| -----:|-----: | Emp | Name1 |Name2 |Address | Job | Doj | Role |DOB

I have iterated the above dataframe and assigned values to variables, and need to pass variable values as columns to another DF as below.

Example,

First_Name = Config.alias('a').select('a.col1).filter("Rownumber = '" + str(i) + "'" ).first()[0]
print("First_Name :" + First_Name )
Last_Name = Config.alias('a').select('a.col2).filter("Rownumber = '" + str(i) + "'" ).first()[0]
print("Last_Name :" + Last_Name )

Now First_Name, Last_Name variable holds the column name of below Dataframe Emp,

Need the dataframe as below,

DF =Emp.select (col(‘Name1’),col(‘Name2),col(‘Address))
1

1 Answers

0
votes

Not sure if I understood the question properly. But as per my understanding, you are getting first_name and last_name from the corresponding dataframes and you want to use it with col function in DF dataframe.

If that's the case, you can use python's str.format as below:

DF = emp.select(col("{}".format(First_Name)), 
        col("{}".format(Last_Name)))