1
votes

I have an RDD in spark which is essentially (timestamp, id), where the timestamp is joda DateTime of the form yyyy/MM/dd HH:mm. The RDD is of class;

case class myRDD(timestamp: org.joda.time.DateTime, id: String)

I am using Spark and Scala.

I want to filter the data to only have a certain day i.e. 2000/01/01, and return something of the form (timestamp, id), but am unsure how to use filter() with the joda timestamp. I have created the start and end of the interval I want to filter by the following;

val start = myFormat.parseDateTime("2000/01/01 00:00")
val end = myFormat.parseDateTime("2000/01/02 00:00”)

but I do not know how to apply this to an RDD, or even if this is the best way to approach this. Any tips would be greatly appreciated.

1
is timestamp a string, or a joda DateTime? - soote
@soote the timestamp is a joda DateTime, the class I created is of the form; case class rdd(timestamp: org.joda.time.DateTime, id: String) - user7810705

1 Answers

1
votes

For just 1 day:

rdd.filter( (timestamp, id) => 
    timestamp.withTimeAtStartOfDay.equals(dayYouWant.withtimeAtStartOfDay) )

For a range of days:

rdd.filter( (timestamp, id) => 
    new Interval(start, end).contains(timestamp) )