1
votes

I'm storing the timestamp as YYYY-mm-dd HH:MM:SSZ in Cassandra and I am able to filter the data to get a certain range of time in cql shell, but when I try the same on a pyspark dataframe I don't get any values in the filtered dataframe.

Can anyone help me find the right datetime format in pyspark for this?

Thank you.

1

1 Answers

0
votes

This format for timestamps works just fine. I think that you have a problem with Spark SQL types, so you may need to perform explicit cast for timestamp string, so Spark can perform correct comparison.

For example, this Scala code works correctly (you may need to adjust it to Python):

import org.apache.spark.sql.cassandra._
val data = spark.read.cassandraFormat("sdtest", "test").load()

val filtered = data.filter("ts >= cast('2019-07-17 14:41:34.373Z' as timestamp) AND ts <= cast('2019-07-19 19:01:56Z' as timestamp)")