0
votes

The regression tree created using the code below has boxplots for all of the terminal nodes. The boxplots shows the median, IQR and outliers, which is great.

plot(as.party(tree), terminal_panel = node_boxplot)

But how do I identify the outliers in my boxplots? I figured that since the boxplots were already created it should be relatively easy to pull out the outliers but the information doesn't seem to be stored in tree. I know I can follow the paths of my tree to identify the outliers in my dataset but was wondering if there's a quicker way to do this.

1
Please modify your question with a smaller sample taken from your data (check?dput()). Posting images of your data or no data makes it difficult to impossible for us to help you! Besides, how does your output plot look like?massisenergy

1 Answers

2
votes

Since you do not provide data, I will do this with the built-in cars data.
You are right that this information does not seem to be stored in the tree. Also, the plot does not provide a meaningful return. At least one way to get at this is to just redo the boxplots using boxplot and you can get the outliers.

library(rpart)
library(partykit)

CarTree = rpart(dist ~ ., data=cars)
PCT = as.party(CarTree)
P = plot(PCT, terminal_panel = node_boxplot)

BP = boxplot(cars$dist ~ PCT[1]$fitted[[1]])
BP$out
[1] 80