data processing using r

Question

I have a file .ped who contains several columns, and I want to extract informations from it. Here a sample of my data (there is no header):

The first column indicates the ID family, the second the ID individual, the third the sex of the individual.

I read the table as a dataframe

ped <- read.table("pedigree.ped", header=FALSE)

How I can compute the number of families exist (one family can appear more than one time and I want to consider them as one)? I have a sex column where 1 designate male and 2 female, how I can get the distribution of males and females in the data set?

I'm newbie to R, if you can give some code!

Thanks in advanced.

Nishanth Nishanth · Accepted Answer · 2013-04-06T01:49:18

Since you are new to R, I would suggest looking into excel first. The operations you are asking for is fairly simple and can be done in excel.

If you want to use R then look into data.frame indexing, subsetting etc.

If you are familiar with SQL, look in to sqldf package

Number of families:

numFamilies <- length(unique(ped[,1]))

Number of males & females:

numMales <- sum(ped[,3] == 1)
numFemales <- sum(ped[,3] == 2)

data processing using r

2 Answers