1
votes

I use Databricks runtime 6.3 and use pySpark. I have a dataframe df_1. SalesVolume is an integer but AveragePrice is a string.

When I execute below code, code runs and I get the correct output.

display(df_1.filter('SalesVolume>10000 and AveragePrice>70000'))

But, below code ends up in error; "py4j.Py4JException: Method and([class java.lang.Integer]) does not exist"

display(df_1.filter(df_1['SalesVolume']>10000 & df_1['AveragePrice']>7000))

Why does the first one work but not the second one?

1
I believe you need to put the conditions in braces if you're using multiple conditions. - pissall

1 Answers

2
votes

you have to wrap your conditions in ()

display(df_1.filter((df_1['SalesVolume']>10000) & (df_1['AveragePrice']>7000)))

Filter accepts SQL like syntax or dataframe like syntax, 1st one works because it's a valid sql like syntax. but second one isn't.