Can I use scikit-learn pipeline to transform a specific variable only?

Question

Reading scikit-learn doc on Pipeline, all the examples apply the transformers on the entire dataset (e.g. StandardScaler, PCA).

Is it possible to, say, only scale a specific variable in the dataset? If this is possible, then I can put my entire feature engineering process into a Pipeline and apply it on both my train and test sets.

See this example : scikit-learn.org/stable/auto_examples/hetero_feature_union.html — Vivek Kumar

Mark Whitfield Mark Whitfield · Accepted Answer · 2017-10-13T00:50:29

You can use a combination of FeatureUnion and custom transformers that take only the variable you're interested in.

However, you're right in that sklearn does not handle heterogeneous feature sets particularly well. There is a library sklearn-pandas which makes it a lot easier, letting you define separate pipelines for specific columns of a pandas dataframe.

Can I use scikit-learn pipeline to transform a specific variable only?

1 Answers