I have voting data in the form of counts for OutcomeA and counts for OutcomeB (there are only two outcomes). I am using the formulation of the glm binomial family of models as suggested here: GLM for proportion data in r ( https://stats.stackexchange.com/questions/89734/glm-for-proportion-data-in-r ) with the y variable being:
cbind (OutcomeA, OutcomeB)
I would like to use the caret package, to do some cross validation and generally handle the output for comparative purposes, as suggested here: Binomial GLM using caret train
I am right in thinking that I can use the vote for outcome A as the 'y' variable, and the total electorate turnout (ie OutcomeA + OutcomeB) as the weight variable? Thanks.
(edit) The (artificial) data looks like:
OutcomeA OutcomeB X1 X2 X3 X4
1234 2345 0.23 0.34 0.34 0.45
2345 2312 0.55 0.57 0.58 0.58
3423 1234 0.45 0.88 0.69 0.12
...
OutcomeA is the number of votes in favour and OutcomeB is the number against.
I want to model the 'quantity' OutcomeA/(OutcomeA+OutcomeB) as a function of X1, X2, X3 and X4 using a binomial family model in glm, via caret.
The splitting of data into training and testing data is not the issues here.