5
votes

I want to perform bagging using python scikit-learn. I want to combine RFE(), recursive feature selection algorithm. The step is like below.

  1. Make 30 subsets allowing redundant selection (bagging)
  2. Perform RFE for each data set
  3. Get output of each classification
  4. find top 5 features from each output

I tried to use BaggingClassifier approach like below, but it took a lot of time and may not seem to work. Using only RFE works without problems(rfe.fit()).

cf1 = LinearSVC()
rfe = RFE(estimator=cf1)
bagging = BaggingClassifier(rfe, n_estimators=30)
bagging.fit(trainx, trainy)

Also, step 4 may be difficult to find top feature, because Bagging classifier does not offer the attribute like ranking_ in RFE(). Is there some other good ways to achieve those 4 steps?

1
is like an answer to this as well . great questionO.rka

1 Answers

0
votes

Without bagging, one would access the ranking given by RFE with the following line:

rfe.ranking_

This order can be used to sort the features names, and then take the five first features. See the documentation for sklearn RFE for an example of this parameter. With bagging, you would want access to each of your 30 estimators. Based on the documentation for sklearn BaggingClassifier, you can have access to them with:

bagging.estimators_

So: for each bagging in bagging.estimators_, get the ranking, sort the features based on this ranking, and take the first five elements ! Hope this helps.