6
votes

I'm trying to plot a regression tree generated with rpart using partykit. Let's suppose the formula used is y ~ x1 + x2 + x3 + ... + xn. What I would like to achieve is a tree with boxplots in terminal nodes, with a label on top listing the 10th, 50th, and 90th percentiles of the distribution of the y values for the observations assigned to each node, i.e., above the boxplot representing each terminal node, I would like to display a label like "10th percentile = $200, mean = $247, 90th percentile = $292."

The code below generates the desired tree:

library("rpart")
fit <- rpart(Price ~ Mileage + Type + Country, cu.summary)
library("partykit")
tree.2 <- as.party(fit)

The following code generates the terminal plots but without the desired labels on the terminal nodes:

plot(tree.2, type = "simple", terminal_panel = node_boxplot(tree.2,
  col = "black", fill = "lightgray", width = 0.5, yscale = NULL,
  ylines = 3, cex = 0.5, id = TRUE))

If I can display a mean y-value for a node, then it should be easy enough to augment the label with percentiles, so my first step is to display, above each terminal node, just its mean y-value.

I know I can retrieve the mean y-value within a node (here node #12) with code such as this:

colMeans(tree.2[12]$fitted[2])

So I tried to create a formula and use the mainlab parameter of the boxplot panel-generating function to generate a label containing this mean:

labf <- function(node) colMeans(node$fitted[2])
plot(tree.2, type = "simple", terminal_panel = node_boxplot(tree.2,
  col = "black", fill = "lightgray", width = 0.5, yscale = NULL,
  ylines = 3, cex = 0.5, id = TRUE, mainlab = tf))

Unfortunately, this generates the error message:

Error in mainlab(names(obj)[nid], sum(wn)) : unused argument (sum(wn)).

But it seems this is on the right track, since if I use:

plot(tree.2, type = "simple", terminal_panel = node_boxplot(tree.2,
  col = "black", fill = "lightgray", width = 0.5, yscale = NULL,
  ylines = 3, cex = 0.5, id = TRUE, mainlab = colMeans(tree.2$fitted[2])))

then I get the correct mean y-value at the root node displayed. I would appreciate help with fixing the error described above so that I show the mean y-values for each separate terminal node. From there, it should be easy to add in the other percentiles and format things nicely.

1
Could you try to make a reproducible version of the problem? Then I'll try to have a look at it.Achim Zeileis
Sure. Thanks @AchimZeileis! The code below uses the cu Consumer Reports dataset that comes with RPART. fit <- rpart(Price ~ Mileage + Type + Country, cu.summary) par(xpd = TRUE)plot(fit, compress = TRUE) text(fit, use.n = TRUE) tree.2<-as.party(fit) plot(tree.2) This will generate a tree plot with boxplots at the terminal nodes. What I'm trying to do is to put the mean (and later some other percentiles) above each of the terminal nodes in a label. So instead of "Node 4 (n=21)" the leftmost terminal node would have a label saying something like "mean = 7629.048"djr99

1 Answers

4
votes

In principle, you are on the right track. But if mainlab should be a function, it is not a function of the node but of id and nobs, see ?node_boxplot. Also you can compute the table of means (or some quantiles) more easily for all terminal nodes using the fitted data for the whole tree:

tab <- tapply(tree.2$fitted[["(response)"]],
  factor(tree.2$fitted[["(fitted)"]], levels = 1:length(tree.2)),
  FUN = mean)

Then you can prepare this for plotting by rounding/formatting:

tab <- format(round(tab, digits = 3))
tab
##           1           2           3           4           5           6 
## "       NA" "       NA" "       NA" " 7629.048" "       NA" "12241.552" 
##           7           8           9          10          11          12 
## "14846.895" "22317.727" "       NA" "       NA" "17607.444" "21499.714" 
##          13 
## "27646.000" 

And for adding this into the display, write your own helper function for the mainlab:

mlab <- function(id, nobs) paste("Mean =", tab[id])
plot(tree.2, tp_args = list(mainlab = mlab))

enter image description here