Consider that I have this dataframe in pyspark:
+--------+----------------+---------+---------+
|DeviceID| TimeStamp |range | zipcode |
+--------+----------------+---------+---------+
| 00236|11-03-2014 07:33|[4.5, 2] | 90041 |
| 00234|11-06-2014 05:55|[6.2, 8] | 90037 |
| 00234|11-06-2014 05:55|[5.6, 4] | 90037 |
| 00235|11-09-2014 05:33|[7.5, 6] | 90047 |
+--------+----------------+---------+---------+
How can I write an script that keep rows when the first value in range array is greater than 6. The output should be like this:
+--------+----------------+---------+---------+
|DeviceID| TimeStamp |range | zipcode |
+--------+----------------+---------+---------+
| 00234|11-06-2014 05:55|[6.2, 8] | 90037 |
| 00235|11-09-2014 05:33|[7.5, 6] | 90047 |
+--------+----------------+---------+---------+
I wrote this scripts:
import pyspark.sql.functions as f
df.filter(f.col("range")[0] > 6)
but I got this error:
AnalysisException: u"Can't extract value from range#12989: need struct type but got vector;"