1
votes

I know Spark Streaming produces batches of RDDs, but I'd like to accumulate one big Dataframe that updates with each batch (by appending new dataframe to the end).

Is there a way to access all historical Stream data like this?

I've seen mapWithState() but I haven't seen it accumulate Dataframes specifically.

1

1 Answers

1
votes

While Dataframes are implemented as batches of RDDs under the hood, a Dataframe is presented to the application as an non-discrete infinite stream of rows. There are no "batches of dataframes" as there are "batches of RDDs".

It's not clear what historical data you would like.