1
votes

I'm new to Spark Streaming. There's a project using Spark Streaming, the input is a key-value pair string like "productid,price".

The requirement is to process each line as a separate transaction, and make RDD triggered every 1 second.

In each interval I have to calculate the total price for each individual product, like

select productid, sum(price) from T group by productid

My current thought is that I have to do the following steps 1) split the whole line with \n val lineMap = lines.map{x=>x.split("\n")}

2) split each line with "," val recordMap=lineMap.map{x=>x.map{y=>y.split(",")}}

Now I'm confused about how to make the first column as key and second column as value, and use reduceByKey function to get the total sum.

Please advise.

Thanks

1

1 Answers

1
votes

Once you have split each row, you can do something like this:

rowItems.map { case Seq(product, price) => product -> price }

This way you obtain a DStream[(String, String)] on which you can apply pair transformations like reduceByKey (don't forget to import the required implicits).