10
votes

We have discussed the questions below:

But Spark Structured Streaming was added at Spark2.2, it brings a lot of changes for streaming, and it is outstanding.

Can we say Spark Strutured Streaming is a streaming processing, or still batch processing?

Now what is the big difference between Apache Flink and Apache Spark Structured Streaming?

1

1 Answers

7
votes

Currently:

Spark Structured Streaming has still microbatches used in background. However, it supports event-time processing, quite low latency (but not as low as Flink), supports SQL and type-safe queries on the streams in one API; no distinction, every Dataset can be queried both with SQL or with typesafe operators. It has end-to-end exactly-one semantics (at least they says it ;) ). The throughput is better than in Flink (there were some benchmarks with different results, but look at Databricks post about the results).

In near future:

Spark Continous Processing Mode is in progress and it will give Spark ~1ms latency, comparable to those from Flink. However, as I said, it's still in progress. The API is ready for non-batch jobs, so it's easier to do than in previous Spark Streaming.

The main difference:

Spark relies on micro-batching now and Flink is has pre-scheduled operators. That means, Flink's latency is lower, but Spark Community works on Continous Processing Mode, which will work similar (as far as I understand) to receivers.