0
votes

I'm reconstructing a paper. They trained Gradient Boosting Regression Trees given the input X and soft targets y_s to get the final output y with minimum mean squared error. Regarding the paper they implemented all decision tree based methods using scitkit-learn package without any modification. This is what I want to do.

If you know the solution already I would be happy to hear, otherwise here are my thoughts: Just for simplification assume we have a binary problem with X = [[x1 x2 x3] [x1 x2 x3]...] and y_s [[0.4 0.6][0.8 0.2]...].

Regarding the GradientBoostingTree for classification (see link above), I can only feed in a 1-dim class array

(y : array-like, shape = [n_samples]) Target values (integers in classification, real numbers in regression) For classification, labels must correspond to classes.

, so even when I would overwrite the cost function (e.g. to cross-entropy) which can handle soft labels, I'm still cannot feed in the 2 dim soft labels (at least).

Another idea was to reduce it to 1-dim by only take one soft label (only works for binary problem where both soft labels add up to 1) and use GradientBoostingRegression instead. But again only one class is possible and I can also not train independent models like

X = [[1,2,3], [1,2,3], [4,5,6]]
y = [[3.141, 2.718], [3.141, 2.718], [2.718, 3.141]]
rgr = MultiOutputRegressor(GradientBoostingRegressor(random_state=0))
rgr.fit(X, y)
X_test = [[1.5,2.5,3.5], [3.5,4.5,5.5]]
rgr.predict(X_test)

because of the correlation between the outputs..

Big picture: 1. Extraction of combined features 2. a) Training: extracted features(Xb), original labels(y) -> logistic regression b) Prediction: soft labels (yb) 3. a) Training: original features (X), soft labels(yb) -> GradientBoostingTree b) Evaluation: predicting normal labels (y_) -> Importance of original features

The entire procedure without the soft labels is worthless. I mean it has to be possible somehow but I cannot figure out how...

http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html http://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_regression.html

1

1 Answers

0
votes

scikit-learn's docs on multi-output decision trees should point you in the right direction