python - Getting model attributes from pipeline

Question

I typically get PCA loadings like this:

pca = PCA(n_components=2)
X_t = pca.fit(X).transform(X)
loadings = pca.components_

If I run PCA using a scikit-learn pipeline:

from sklearn.pipeline import Pipeline
pipeline = Pipeline(steps=[    
('scaling',StandardScaler()),
('pca',PCA(n_components=2))
])
X_t=pipeline.fit_transform(X)

is it possible to get the loadings?

Simply trying loadings = pipeline.components_ fails:

AttributeError: 'Pipeline' object has no attribute 'components_'

(Also interested in extracting attributes like coef_ from pipelines.)

Andreas Mueller Andreas Mueller · Accepted Answer · 2015-03-03T17:07:08

Did you look at the documentation: http://scikit-learn.org/dev/modules/pipeline.html I feel it is pretty clear.

Update: in 0.21 you can use just square brackets:

pipeline['pca']

or indices

pipeline[1]

There are two ways to get to the steps in a pipeline, either using indices or using the string names you gave:

pipeline.named_steps['pca']
pipeline.steps[1][1]

This will give you the PCA object, on which you can get components. With named_steps you can also use attribute access with a . which allows autocompletion:

pipeline.names_steps.pca.<tab here gives autocomplete>

python - Getting model attributes from pipeline

2 Answers

Using Neuraxle

Nested pipelines: