How can I accumulate Dataframes in Spark Streaming?

Question

I know Spark Streaming produces batches of RDDs, but I'd like to accumulate one big Dataframe that updates with each batch (by appending new dataframe to the end).

Is there a way to access all historical Stream data like this?

I've seen mapWithState() but I haven't seen it accumulate Dataframes specifically.

Richard Fuller Richard Fuller · Accepted Answer · 2018-07-31T20:38:52

While Dataframes are implemented as batches of RDDs under the hood, a Dataframe is presented to the application as an non-discrete infinite stream of rows. There are no "batches of dataframes" as there are "batches of RDDs".

It's not clear what historical data you would like.

How can I accumulate Dataframes in Spark Streaming?

1 Answers