0
votes

I am using R and the caret package for a classification task. For feature elimination I am using rfe, which has different options, among them, what is the metric that I want to maximize/minimize.

The problem is that rfe accepts metrics such as RMSE, kappa, and I want to use a different metric to maximize, in mi case I want to maximize ScoreQuadraticWeightedKappa from the Metrics library, but I don't know how to do that.

I have the following code:

control <- rfeControl(functions = rfFuncs, method="cv", number=2)
results <- rfe(dataset[, -59], dataset[, 59], 
               sizes = c(1:58), rfeControl = control)

How do I edit it, for rfe to maximize ScoreQuadraticWeightedKappa?

1

1 Answers

0
votes

You need to modify the postResample function, or create your own function that's similar, and then insert it into rfFuncs$summary. The default postResample function here below:

> postResample
function (pred, obs) 
{
    isNA <- is.na(pred)
    pred <- pred[!isNA]
    obs <- obs[!isNA]
    if (!is.factor(obs) & is.numeric(obs)) {
        if (length(obs) + length(pred) == 0) {
            out <- rep(NA, 2)
        }
        else {
            if (length(unique(pred)) < 2 || length(unique(obs)) < 
                2) {
                resamplCor <- NA
            }
            else {
                resamplCor <- try(cor(pred, obs, use = "pairwise.complete.obs"), 
                  silent = TRUE)
                if (class(resamplCor) == "try-error") 
                  resamplCor <- NA
            }
            mse <- mean((pred - obs)^2)
            n <- length(obs)
            out <- c(sqrt(mse), resamplCor^2)
        }
        names(out) <- c("RMSE", "Rsquared")
    }
    else {
        if (length(obs) + length(pred) == 0) {
            out <- rep(NA, 2)
        }
        else {
            pred <- factor(pred, levels = levels(obs))
            requireNamespaceQuietStop("e1071")
            out <- unlist(e1071::classAgreement(table(obs, pred)))[c("diag", 
                "kappa")]
        }
        names(out) <- c("Accuracy", "Kappa")
    }
    if (any(is.nan(out))) 
        out[is.nan(out)] <- NA
    out
}

More specifically, since you are doing classification, you will need to modify the portion of postResample that says:

    else {
        if (length(obs) + length(pred) == 0) {
            out <- rep(NA, 2)
        }
        else {
            pred <- factor(pred, levels = levels(obs))
            requireNamespaceQuietStop("e1071")
            out <- unlist(e1071::classAgreement(table(obs, pred)))[c("diag", 
                                    "kappa")]
        }
        names(out) <- c("Accuracy", "Kappa")
    }

After you've edited postResample, or created your own equivalent function, you can run:

rfFuncs$summary <- function (data, lev = NULL, model = NULL) {
    if (is.character(data$obs)) 
        data$obs <- factor(data$obs, levels = lev)
    postResample(data[, "pred"], data[, "obs"])
}

Just make sure postResample has been edited or replace it with the name of your equivalent function.