1
votes

Hello I am using the data dystrophy from package ipred. I've used a subset to separate from carriers and normal:

carrier = subset(dystrophy,dystrophy$Class == "carrier")
normal = subset(dystrophy,dystrophy$Class == "normal")

and I've reduce this data selecting only the patients with 1 visit at the hospital:

carrier = subset(carrier,carrier$OBS == "1")
normal = subset(normal,normal$OBS == "1")

So now I would like to practice calculating the means vector, covariance matrix and a correlation matrix of the proteins but by separated groups(Class factor).

I 've tried with cor and cov, but I think I am doing something wrong. Any help would be appreciated. thanks!!

1
What is your output when calling cov and cor. If I recall correctly the inputs of cov and cor have to be of the class numeric, so factor will not work.FloSchmo
To elaborate on the comment of @FloSchmo, please include the code that you tried to get cor and cov. It will help us respond to your problem.G5W

1 Answers

1
votes

This may get you started. Using your variables, you can get the means for each of the proteins using:

sapply(carrier[,6:9], mean, na.rm=T)
sapply(normal[,6:9], mean, na.rm=T)

For the correlation and covariance you can use:

cor(carrier[,6:9], use="pairwise.complete.obs")
cor(normal[,6:9], use="pairwise.complete.obs")

cov(carrier[,6:9], use="pairwise.complete.obs")
cov(normal[,6:9], use="pairwise.complete.obs")

The 6:9 part is there to restrict the computation to the proteins and not include other features like Age. The use="pairwise.complete.obs" part is there to handle the missing values.