I have a social platform where users have the option of creating new posts. That post data goes to the database as of now. I want to implement a streaming algorithm to process these new posts using spark streaming.
Query 1: So, I want to know how to send these new posts from the database to the spark streaming architecture? I would like to know whether I should use kafka as the middleman here (maybe scalable in the future I think) or just stream the data from the database to spark streaming via some socket (if yes, please tell me how).
The databases used are firebase and mongodb (would be better if the procedure is explained for both).
Query 2: I started learning about kafka. Its mentioned that it can process stream posts. So, why not use kafka to process streams instead of spark streaming? Why people mostly use kafka just as message broker and not for processing streams?
Thanks in advance.