I am writing an application which uses Linear Regression. In my case sklearn.linear_model.Ridge
. I have trouble bringing my datapoint I like to predict in the correct shape for Ridge
. I briefly describe my two applications and how the problem turns up:
1RST APPLICATION:
My datapoints have just 1 feature each, which are all Strings, so I am using One-Hot-Encoding to be able to use them with Ridge
. After that, the datapoints (X_hotEncoded
) have 9 features each:
import pandas as pd
X_hotEncoded = pd.get_dummies(X)
After fitting Ridge
to X_hotEncoded
and labels y
I save the trained model with:
from sklearn.externals import joblib
joblib.dump(ridge, "ridge.pkl")
2ND APPLICATION:
Now that I have a trained model saved on disk, I like to retrieve it in my 2nd application and predict y
(Label) for just one datapoint. That's where I encounter above mentioned problem:
# X = one datapoint I like to predict y for
ridge= joblib.load("ridge.pkl")
X_hotEncoded = pd.get_dummies(X)
ridge.predict(X_hotEncoded) # this should give me the prediction
This gives me the following Error in the last line of code:
ValueError: shapes (1,1) and (9,) not aligned: 1 (dim 1) != 9 (dim 0)
Ridge
was trained with 9 features because of the use of One-Hot-Encoding I used on all the datapoints. Now, when I like to predict just one datapoint (with just 1 feature) I have trouble bringing this datapoint in the correct shape for Ridge to be able to handle it. One-Hot-Encoding has no affect on jsut one datapoint with just one feature.
Does anybody know a neat solution to this problem?
A possible solution might be to write the column names to disk in the 1rst Application and retrieve it in the 2nd and then rebuild the datapoint there. The column names of one-hot-encoded arrays could be retrieved like stated here: Reversing 'one-hot' encoding in Pandas