I am creating a application in which getting streaming data which goes into kafka and then on spark. consume the data, apply some login and then save processed data into the hive. velocity of data is very fast. I am getting 50K records in 1min. There is window of 1 min in spark streaming in which it process the data and save the data in the hive.
my question is for production prospective architecture is fine? If yes how can I save the streaming data into hive. What I am doing is, creating dataframe of 1 min window data and will save it in hive by using
results.write.mode(org.apache.spark.sql.SaveMode.Append).insertInto("stocks")
I have not created the pipeline. Is it fine or I have to modified the architecture?
Thanks