0
votes

I have a dataframe df and a column name setp To create a list I wrote

setp_list=df.select ('setp').distinct().collect()
setp_array=[row.setp for row in setp_list]
setp_array= str(setp_array)[1:-1]

I wanted to use it in the spark.sql statement

df1=spark.sql(f"select * from table where setp in ({setp_array})").

I am not sure how to display the list to see how is was created but mainly I want it to include in the spark sql statement. It throws error at spark sql statement as invalid syntax

1

1 Answers

2
votes

Avoid collecting items from one table and use it in the query of another table. Use a JOIN to write relational queries.

df.createOrReplaceTempView('df')

df1 = spark.sql("select * from table semi join df using(setp)")