I am aiming to better predict a buying habits of a company's customer base according to several customer attributes (demographic, past purchase categories, etc). I have a data set of about 100,000 returning customers including the time interval from their last purchase (the dependent variable in this study) along with several attributes (both continuous and categorical).
I plan on doing a survival analysis on each segment (segments defined as having similar time intervals across observations) to help understand likely time intervals between purchases. The problem I am encountering is how to best define these segments; i.e. groupings of attributes such that the time interval is sufficiently different between segments and similar within segments. I believe that building a decision tree is the best way to do this, I would suppose using recursive partitioning.
I am new to R and have poked around with the party
package's mob
command, however I am confused by which variables to include in the model and which to include for partitioning (command: mob(y ~ x1 + ... + xk | z1 + ... + zk)
, x
being model variables and z
being partitions). I simply want to build a tree from the set of attributes, so I suppose I want to partition on all of them? Not sure. I have also tried the rpart
command but either get no tree or a tree with hundreds of thousands of nodes depending on the cp level.
If anyone has any suggestions, I'd appreciate it. Sorry for the novel and thanks for the help.