2
votes

I was trying to bring some of my R Code to Julia, but have a problem with the GLM Package. The dataset is grouped by age and in each group are m_i individuals from which N_i are sick. I want to estimate the probability of being sick as a function of age - a typical logistic regression problem. I R the code would look like:

fit <- glm(cbind(N, m - N) ~ age, family = binomial, data = heart)

I tried in Julia the following function call, but it does not work:

glm(@formula((N, m-N) ~ age), df, Binomial(), LogitLink())

Any ideas? The dataset could be found here: http://stat.ethz.ch/Teaching/Datasets/heart.dat

Thank you.

1

1 Answers

4
votes

You have to construct a binary variable sick that corresponds to number of sick and not sick observations in each age group. I achieve this below by creating a separate DataFrame for each age group and then running vcat on them.

Here is the code that does the job assuming that you read in your data in heart data frame (I squashed creation of heart_flat into one line, but you can extract the comprehension inside to see what is created on the go):

heart_flat = vcat([DataFrame(age=row[:age],
                             sick=[ones(Int, row[:N]);
                                   zeros(Int, row[:m]-row[:N])])
                   for row in eachrow(heart)]...)

glm(@formula(sick ~ age), heart_flat, Binomial(), LogitLink())

It produces the same estimates as those in R.