I have a dataframe with 3 features and 3 classes that I split into X_train, Y_train, X_test, and Y_test and then run Sklearn's Pipeline with PCA, StandardScaler and finally Logistic Regression. I want to be able to calculate the probabilities directly from the LR weights and the raw data without using predict_proba but don't know how because I'm not sure exactly how pipeline pipes X_test through PCA and StandardScaler into logistic regression. Is this realistic without being able to use PCA's and StandardScaler's fit method? Any help would be greatly appreciated!
So far, I have:
pca = PCA(whiten=True)
scaler = StandardScaler()
logistic = LogisticRegression(fit_intercept = True, class_weight = 'balanced', solver = sag, n_jobs = -1, C = 1.0, max_iter = 200)
pipe = Pipeline(steps = [ ('pca', pca), ('scaler', scaler), ('logistic', logistic) ]
pipe.fit(X_train, Y_train)
predict_probs = pipe.predict_proba(X_test)
coefficents = pipe.steps[2][1].coef_ (3 by 30)
intercepts = pipe.steps[2][1].intercept_ (1 by 3)
pipe.predict_proba(X_test)
? – Vivek Kumarpipe
, if you sendX_test
,pca
andscaler
will be fit again then dont worry. Onlytransform
will be called in them andpredict_probas
onlogistic
. – Vivek Kumarfit
on training data and only callpredict
ortransform
on test data. When you callpredict_proba
on a pipeline, all estimators excluding the last one will only calltransform
and then pass the data further. The last one will callpredict_proba
– Vivek Kumar