0
votes

My main question is this:

How do you retrieve conditional probabilities for a Naïve Bayes model using the caret Package in R?

Background:

I have run a Naïve Bayes Model using the caret Package in R. The dataset is essentially health dataset with a binary outcome variable (mistake vs not a mistake) with a series of categorical predictors and one or two numerical predictors. For this, we are using a 5 fold, cross validation technique.

The model runs fine, but I would like to retrieve the conditional probabilities. How do I do this? For example, one of the predictors is "Pulse" which has 3 factors: Low , Normal, and High. I would like to retrieve something like the following "Given a Low Pulse, what is the probability of a Mistake" AKA: p(y = "Mistake" | Pulse="Low").

The relevant code is here:

ctrl<-trainControl(method="cv",number=5,classProb=T)
mod4<-train(Target~.,data=train,method="nb",trControl=ctrl)

In the KLAR package, it's not hard to do (The second line displays this):

model4<-naiveBayes(Target~. ,data=train, scale=T)
model4_variable_posterior_prob <- model4$tables[[var2]]

However, I'd really like to use the cross validated model that Caret produces above because it's a lot more accurate.

I should note that Caret produces some tables in here:

mod4$finalModel$tables$

However, I'm not sure if these tables are the conditional probabilities or some other values.

For example, mod4$finalModel$tables$PulseX2 produces the following:

        [,1]      [,2]
X1 0.1343284 0.3415149
X2 0.1731343 0.3789293 

I believe PulseX2 is the table for Pulse= Medium and PulseX3 is the table for Pulse=High, but I'm not entirely sure. However I do know that in the above, X1 is a "mistake" and X2 is "not a mistake" But my question is, is [,1] column a "0" value for the categorical factor variable of PulseX2? And is [,2] column a "1" value for the categorical factor variable of PulseX2? So by that logic, is .3415149 p( y= Mistake (or X1=1) | Pulse = X2) above the baseline of PulseX1 or something? Does anyone know what these values mean?

Alternatively, if there is some way I can retrieve some information on the important individual factors (not just important variables) that too would be fine.

1

1 Answers

1
votes

This isn't really about caret; that object is created by the NaiveBayes function in the klaR package. The documentation for that package says:

tables: A list of tables, one for each predictor variable. For each categorical variable a table giving, for each attribute level, the conditional probabilities given the target class. For each numeric variable, a table giving, for each target class, mean and standard deviation of the (sub-)variable or a object of class density.