I am using partykit:ctree to explore my dataset, which is a set of about 15,000 beach surveys, investigating the number of pieces of debris found from 50 different categories. There are lots of zeros in the data, and a large spread of total debris amounts. I also have a series of independent variables, including some factors, some count data, and some continuous data.
Here is a very small sample dataset:
Counts<- as.data.frame(matrix (rpois(100,1), ncol=5))
colnames(Counts)<-c("Glass", "HardPlastic", "SoftPlastic", "PlasticBag", "Fragments")
State<-rep(c("CA","OR","WA"), each=6)
Counts$State<-c(State,"CA","OR")
County<-rep((1:9), each=2)
Counts$County<-c(County, 1,4)
Counts$Distance<-c(10, 15, 13, 19, 18, 23, 38, 40, 49, 44, 47, 45, 52, 53, 55, 59, 51, 53, 14, 33)
Year<-rep(c("2010","2011","2012"), times=7)
Counts$Year<-Year[1:20]
I have used the following code to partition my data:
M.2<-ctree(Glass + HardPlastic + SoftPlastic + PlasticBag + Fragments ~
as.factor (State) + as.factor (County) + Distance + as.factor (Year), data=Counts)
plot(M.2, terminal_panel = node_barplot, cex = 0.5)
This comes up with a lovely graph, but how do I extract the membership of each of the terminal nodes? I can see it in the graph if there are only a few items, but once the number of possible categories increases to 50, it becomes much harder to look at it graphically. I would like to see the information contained within the nodes; particularly the relative probabilities of each individual category being contained in each terminal node.
I know that if this were a BinaryTree class, I could use the nodes argument, but when I query the class(M.2) it tells me it is from the constaparty class, and I haven't been able to find how to get node information from this class.
I have also run into a secondary problem, which is that when I run the ctree on my sample data set, it crashes R every time! It works fine with my actual data set, but I can't figure out what is wrong with the sample set.
EDIT: The desired output would be something along the lines of:
Node15:
Hard Plastic 30
Glass 5
Soft Plastic 23
Plastic Bag 6
Fragments 12
party
package. – David Arenburg