I am using Standardscaler to normalize my dataset, that is I turn each feature into a z-score, by subtracting the mean and dividing by the Std.
I would like to use Standardscaler within sklearn's pipeline and I am wondering how exactly the transformation is applied to X_test. That is, in the code below, when I run pipeline.predict(X_test)
, it is my understanding that the StandardScaler
and SVC()
is run on X_test, but what exactly does Standardscaler
use as the mean and the StD? The ones from the X_Train
or does it compute those only for X_test
? What if, for instance X_test
consists only of 2 variables, the normalization would look a lot different than if I had normalized X_train
and X_test
altogether, right?
steps = [('scaler', StandardScaler()),
('model',SVC())]
pipeline = Pipeline(steps)
pipeline.fit(X_train,y_train)
y_pred = pipeline.predict(X_test)