Is it possible to train an XGboost model in python and use the saved model to predict in spark environment ? That is, I want to be able to train the XGboost model using sklearn, save the model. Load the saved model in spark and predict in spark. Is this possible ?
edit: Thanks all for the answer , but my question is really this. I see the below issues when I train and predict different bindings of XGBoost.
During training I would be using XGBoost in python, and when predicting I would be using XGBoost in mllib.
I have to load the saved model from XGBoost python (Eg: XGBoost.model file) to be predicted in spark, would this model be compatible to be used with the predict function in the mllib
The data input formats of both XGBoost in python and XGBoost in spark mllib are different. Spark takes vector assembled format but with python, we can feed the dataframe as such. So, how do I feed the data when I am trying to predict in spark with a model trained in python. Can I feed the data without vector assembler ? Would XGboost predict function in spark mllib take non-vector assembled data as input ?
spark
as an orchestration system to both train and predictsklearn
models viaspark-sklearn
module. it will push iterations of each model to differentspark
executors. – thePurplePython