1
votes

I have a question about the approach to deal with a multilabel classification problem.

Based on literature review, I found one most commonly-used approach is Problem Transformation Approach. It transformed the multilabel problem to a number of single label problems, and the classification result is just the simple union of each single label classifier, using the binary relevant approach.

Since a single label problem can be catergorized as either binary classification (if there are two labels) or multiclass classification problem (if there are multiple labels i.e., labels>2), the current transformation approach seems all transform the multilabel problem to a number of binary problems. But this would be cause the data imbalance issue, because of the negative class may have much more documents than the positive class.

So my question, why not transform to a number of multiclass problems, and then apply the direct multiclass classification algorithms to avoid the data imbalance problem. In this case, for one testing document, each trained single label multiclass classifier would predict whether to assign the label, and the union of all such single label multiclass classifier prediction results would be the final set of labels for that testing documents.

In summary, compared to transform a multilabel classification problem to a number of binary classification problems, transform a multilabel classification problem to a number of multiclass classification problems could avoid the data imbalance problem. Other than this, everything stays the same for the above two methods: you need to construct |L|(|L| means the total number of different labels in the classification problem) single label (either binary or multiclass) classifier, you need to prepare |L| sets of training data and testing data, you need to test each single label classifier on the testing document and the union of prediction results of each single label classifier is the final label set for the testing document.

Hope anyone could help clarify my confusion, thanks very much!

1

1 Answers

3
votes

what you describe is a known transformation strategy to multi-class problems called Label Power Set Transformation Strategy.

Drawbacks of this method:

  • The LP transformation may lead to up to 2^|L| transformed labels.
  • Class imbalance problem.

Refer to: Cherman, Everton Alvares, Maria Carolina Monard, and Jean Metz. "Multi-label problem transformation methods: a case study." CLEI Electronic Journal 14.1 (2011): 4-4.