2
votes

I trained a model using rpart and I want to generate a plot displaying the Variable Importance for the variables it used for the decision tree, but I cannot figure out how.

I was able to extract the Variable Importance. I've tried ggplot but none of the information shows up. I tried using the plot() function on it, but it only gives me a flat graph. I also tried plot.default, which is a little better but still now what I want.

Here's rpart model training:

argIDCART = rpart(Argument ~ ., 
                  data = trainSparse, 
                  method = "class")

Got the variable importance into a data frame.

argPlot <- as.data.frame(argIDCART$variable.importance)

Here is a section of what that prints:

       argIDCART$variable.importance
noth                             23.339346
humanitarian                     16.584430
council                          13.140252
law                              11.347241
presid                           11.231916
treati                            9.945111
support                           8.670958

I'd like to plot a graph that shows the variable/feature name and its numerical importance. I just can't get it to do that. It appears to only have one column. I tried separating them using the separate function, but can't do that either.

ggplot(argPlot, aes(x = "variable importance", y = "feature"))

Just prints blank.

The other plots look really bad.

plot.default(argPlot)

Looks like it plots the points, but doesn't put the variable name.

2

2 Answers

4
votes

Since there is no reproducible example available, I mounted my response based on an own R dataset using the ggplot2 package and other packages for data manipulation.

library(rpart)
library(tidyverse)
fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
df <- data.frame(imp = fit$variable.importance)
df2 <- df %>% 
  tibble::rownames_to_column() %>% 
  dplyr::rename("variable" = rowname) %>% 
  dplyr::arrange(imp) %>%
  dplyr::mutate(variable = forcats::fct_inorder(variable))
ggplot2::ggplot(df2) +
  geom_col(aes(x = variable, y = imp),
           col = "black", show.legend = F) +
  coord_flip() +
  scale_fill_grey() +
  theme_bw()

enter image description here

ggplot2::ggplot(df2) +
  geom_segment(aes(x = variable, y = 0, xend = variable, yend = imp), 
               size = 1.5, alpha = 0.7) +
  geom_point(aes(x = variable, y = imp, col = variable), 
             size = 4, show.legend = F) +
  coord_flip() +
  theme_bw()

enter image description here

0
votes

If you want to see the variable names, it may be best to use them as the labels on the x-axis.

plot(argIDCART$variable.importance, xlab="variable", 
    ylab="Importance", xaxt = "n", pch=20)
axis(1, at=1:7, labels=row.names(argIDCART))

Variable Importance

(You may need to resize the window to see the labels properly.)

If you have a lot of variables, you may want to rotate the variable names so that the do not overlap.

par(mar=c(7,4,3,2))
plot(argIDCART$variable.importance, xlab="variable", 
    ylab="Importance", xaxt = "n", pch=20)
axis(1, at=1:7, labels=row.names(argIDCART), las=2)

Rotated axis labels

Data

argIDCART = read.table(text="variable.importance
noth                             23.339346
humanitarian                     16.584430
council                          13.140252
law                              11.347241
presid                           11.231916
treati                            9.945111
support                           8.670958", 
header=TRUE)