
DSMS corresponds to Data Stream Management Systems. These systems allow users to submit queries that will be continuously executed until being removed by the user.

Can systems such as Storm and Flink be seen as DSMS or are they something more generic?



Both types of systems are more orthogonal to each other as they try to solve different use cases. Thus, none does subsume or is a generalization of the other.

DSMS are usually:

  • end-to-end solutions providing storage and computation as a unified solution
  • required to import external data into system first
  • often DSMS are SQL orientated what makes them easy to use but often they are less expressive
  • usually can only handle structured data (schema based tuple format)
  • DSMS do often not scale

Stream Processing Frameworks (Flink, Storm, Spark):

  • only provide a computation layer and consumer data from other storage systems
  • most offer language embedded DSL (some also offer SQL to some extent)
  • can handle any type of data (flat tuples, JSON, XML, flat files, text)
  • build to scale to large clusters (many hundreds of nodes)
  • good for data crunching, machine learning

Streaming Platform (Kafka)

  • provides storage layer and computation
  • can handle any type of data as long as imported into the system (flat tuples, JSON, XML, flat files, text)
  • scalable and elastic
  • no SQL, only Java DSL (Confluent Platform which is based on Kafka offers KSQL as developer preview)
  • very good to build micro services