I am trying to aggregate and find some metrics using Spark streaming (Reading from Kafka) every minute. I am able to aggregate the data for that particular minute. How do I make sure I can have a bucket for current day and sum up all the aggregate values of all minutes in that day?
I have a data frame and I am doing something similar to this.
sampleDF = spark.sql("select userId,sum(likes) as total from likes_dataset group by userId order by userId")