Using Apache Spark's mllib, I have a Logistic Regression model that I store in HDFS. This Logistic Regression model is trained on historical data coming in from some sensors.
I have another spark program that consumes streaming data from these sensors. I want to be able to use the pre-existing trained model to do predictions on incoming data stream. Note: I don't want my model to be updated by this data.
To load the training model, I'd have to use the following line in my code:
val logisticModel = LogisticRegressionModel.load(sc, <location of model>)
sc: spark context.
However, this application is a streaming application and hence already has a "StreamingContext" setup. Now, from what I've read, it is bad practice to have two contexts in the same program (even though it is possible).
Does this mean that my approach is wrong and I can't do what I'm trying to ?
Also, would it make more sense if I keep storing the stream data in a file and keep running logistic regression on that rather than trying to do it directly in the streaming application ?