My main question is this:
How do you retrieve conditional probabilities for a Naïve Bayes model using the caret
Package in R?
Background:
I have run a Naïve Bayes Model using the caret
Package in R. The dataset is essentially health dataset with a binary outcome variable (mistake vs not a mistake) with a series of categorical predictors and one or two numerical predictors. For this, we are using a 5 fold, cross validation technique.
The model runs fine, but I would like to retrieve the conditional probabilities. How do I do this? For example, one of the predictors is "Pulse" which has 3 factors: Low
, Normal
, and High
. I would like to retrieve something like the following "Given a Low Pulse, what is the probability of a Mistake" AKA:
p(y = "Mistake" | Pulse="Low")
.
The relevant code is here:
ctrl<-trainControl(method="cv",number=5,classProb=T)
mod4<-train(Target~.,data=train,method="nb",trControl=ctrl)
In the KLAR package, it's not hard to do (The second line displays this):
model4<-naiveBayes(Target~. ,data=train, scale=T)
model4_variable_posterior_prob <- model4$tables[[var2]]
However, I'd really like to use the cross validated model that Caret produces above because it's a lot more accurate.
I should note that Caret produces some tables in here:
mod4$finalModel$tables$
However, I'm not sure if these tables are the conditional probabilities or some other values.
For example, mod4$finalModel$tables$PulseX2
produces the following:
[,1] [,2]
X1 0.1343284 0.3415149
X2 0.1731343 0.3789293
I believe PulseX2
is the table for Pulse= Medium
and PulseX3
is the table for Pulse=High
, but I'm not entirely sure. However I do know that in the above, X1
is a "mistake" and X2
is "not a mistake" But my question is, is [,1]
column a "0" value for the categorical factor variable of PulseX2
? And is [,2]
column a "1" value for the categorical factor variable of PulseX2
? So by that logic, is .3415149 p( y= Mistake (or X1=1) | Pulse = X2)
above the baseline of PulseX1 or something? Does anyone know what these values mean?
Alternatively, if there is some way I can retrieve some information on the important individual factors (not just important variables) that too would be fine.