0
votes

I would like to know how can I do in the same program to generate random data using apache Kafka and receive it using spark streaming.

Let's show a use case:

I want to generate random data like this -> (A, B, [email protected]) while X seconds. And then I want to receive this data for processing it in real time (while I'm receiving it), and if the second parameter is B send an email to '[email protected]' with the following message: "The first parameter is A".

I know that I have to start a zookeeper server, then start a kafka broker, then create a topic, and then a producer for produce and send this data. For create the connection between kafka and streaming I need to use "createStream" function. But I don't know how to use a producer to send this data and then receive it with spark streaming for processing it. All this in the same program and using Java.

Any help? Thank you.

1
google for "kafka producer java example". Then let us know if you have some specific problems. - maasg
I will write you the same that I said to Matthias J. Sax. Now I have a producer program for generating that data link, inside I added the message (A, B, [email protected]). I have the spark program here link, and inside I want to read the data and process it sending the email if the second parameter is a B. I'm not very familiar with this,but I'm trying it. Now for testing this, I have to start kafka (including zk), and I should need one file more (main class) which starts producer program for write into kafka right? For spark I only have to submit the program right? Thank you! - Mohamed Said Benmousa

1 Answers

1
votes

There will not be a single program, but a Kafka producer program and a Spark program. For both, there are couple of examples available online, eg:

To run this, you start Kafka (including ZK) and your Spark cluster. Afterwards, you start your Producer program that writes into Kafka and you Spark job that reads from Kafka (I guess the order to start Producer and Spark job should not matter).