0
votes

I'm new to Spark streaming and have following situation:

  • Multiple (health) devices send their data to my service, every event has at least following data inside (userId, timestamp, pulse, bloodPressure).
  • In the DB I have per user a threshold for pulse and bloodPressure.

Use Case:

  • I would like to make a sliding window with Spark streaming which calculates the average per user for pulse and bloodpressure, let's say within 10 min.
  • After 10 min I would like to check in the DB if the values exceed the threshold per user and execute an action, e.g. call a rest service to send an alarm.

Could somebody tell me if this is generally possible with Spark, and if yes, point me in the right direction?

1

1 Answers

1
votes

This is definitely possible. It's not necessarily the best tool to do so though. It depends on the volume of input you expect. If you have hundreds of thousands devices sending one event every second, maybe Spark could be justified. Anyway it's not up to me to validate your architectural choices but keep in mind that resorting to Spark for these use cases make sense only if the volume of data cannot be handled by a single machine.

Also, if the latency of the alert is important and a second or two make a difference, Spark is not the best tool. A processor on a single machine can achieve lower latencies. Otherwise use something more streaming-oriented, like Apache Flink.

As a general advice, if you want to do it in Spark, you just need to create a source (I don't know where your data come from), load the thresholds in a broadcast variable (assuming they are constant over time) and write the logic. To make the rest call, use forEachRdd as the output sink and implement the call logic there.