3
votes

I need to test which effects I should include in my model for genetic evaluation of cows. In SAS I would use a proc GLM. The SAS code would be:

data paula1; set paula0;
proc glm;
class year herd season;
model milk= year herd season age age*age;
run;

My R code is:

model1 = glm(milk ~ factor(year) + factor(herd) + factor(season) + age + I(age^2), data=paula1)
anova(model1)

I suspect that there is something wrong because all effects are statistically significant, even when I include other effects that are not related to the trait. I do not have a SAS license anymore to compare the results. Is my code in R correct? Does glm in R presents the type 3 sum of squares (for unbalanced data as presented in SAS)? Is there any difference in this case for using lm? Thanks in advance. Paula

3
You should read this, particularly the linked pdf, for why the R community has strong feelings about so called Type III SS: stats.stackexchange.com/a/23198joran
Short answer to "type III": No. You are asking us to comment on methds for a statistical test when you have provided no data. Not really a coding question, is it?. If you want to ask stats question you should go to CrossValidated.comIRTFM

3 Answers

6
votes

This is a very common error between SAS and R users.

The glm package in SAS is different to the glm function in R and I explain below.

This is for SAS from the official site: "The GLM procedure uses the method of least squares to fit general linear models". The GML in SAS in short for General Linear Models. These are completely different to the Generalized Linear Models which is what the GML function in R calculates.

This is for the glm function in R: " Generalized linear models are just as easy to fit in R as ordinary linear model. In fact, they require only an additional parameter to specify the variance and link functions. The basic tool for fitting generalized linear models is the glm function, which has the folllowing general structure:

glm(formula, family, data, weights, subset, ...)"

In general, general linear models use the ordinary least squares method for parameter estimation, whereas the generalized linear models use the maximum likelihood estimation for parameter estimation. Generalized linear models also "allow the linear model to be related to the response variable via a link function and allow the magnitude of the variance of each measurement to be a function of its predicted value. (taken from wikipedia here)"

To end my long speech what you need to use is the lm function in R which as for the ANOVA table will give you the same results as the GLM package in SAS. For the type III error check the accurate comments of joran and BondedDust.

Hope it helps!

1
votes

The SAS procedure that corresponds to R's glm is GENMOD.

The proper way to enter polynomial terms in R's regression models is through the use of poly. Read the help page ?poly. For orthogonal polynomial of quadratic degree:

lm( milk ~ year + herd + season + poly(age, 2), data=dat)

You specifically should NOT use `age + I(age^2) since those two terms will have a high degree of correlation and you will get faulty inferences about the significance of one or more of the polynomial orders.

If there is ambiguity about the class of the columns such that they are not character or factor or logical, then you may need to wrap factor(.) around them as LyzandeR illustrated, but usually that is only necessary if a term is of type-'numeric'.

If for some reason, for instance lack of education of your superiors about type III errors, you do need to use them, then look at the car package which has facilities for their production.

0
votes

The following script will do what you want.

install.packages("sasLM")

require(sasLM)

GLM(milk ~ factor(year) + factor(herd) + factor(season) + age + I(age^2), paula1)