0
votes

I am looking to implement multilabel classification using ML.NET. I read few posts which say it is not possible directly but rather through problem transformation by converting it into multiple binary classification problems. So essentially I will be required to create n classifier if my dataset has n tags. I tried to do this by splitting my dataset label wise. But fit method throws below exception. I am passing value of label column as 1 for all entries for a given label.

System.ArgumentOutOfRangeException: 'Must be at least 2. Parameter name: numClasses'

This can be fixed by adding entries with a particular label as 1 and all other entries as 0 but since each label will have lesser number of entries, I think that will dilute the learning and may result in lower accuracy.

Can someone suggest any other way to implement multilabel classification with ML.NET?

1

1 Answers

2
votes

Create N boolean columns. Example naming pattern: Label01, Label02, ...LabelNN.

Training pipeline, add N sets of: (one for each boolean label)

.Append(mlContext.BinaryClassification.Trainers.LightGbm(labelColumnName: "Label01", featureColumnName: "Features"))
.Append(mlContext.Transforms.CopyColumns("Score01", "Score")) // Copy to a unique name so the following models won't shadow (replace) the column. PredictedLabel column can also be saved.              

.Append(mlContext.BinaryClassification.Trainers.LightGbm(labelColumnName: "Label02", featureColumnName: "Features"))
.Append(mlContext.Transforms.CopyColumns("Score02", "Score"))

...

.Append(mlContext.BinaryClassification.Trainers.LightGbm(labelColumnName: "LabelNN", featureColumnName: "Features"))
.Append(mlContext.Transforms.CopyColumns("ScoreNN", "Score"))  

Then call .fit() as normal. All of the models in the pipeline will be fit. You can then access each of the ScoreXX columns to get the scores for each class.

To evaluate the quality of each model, you can create metrics from each of the score columns vs. their input LabelXX column.