1
votes

I have a file .ped who contains several columns, and I want to extract informations from it. Here a sample of my data (there is no header):

1  1  1 
1  2  1
2  3  2
3  4  1
3  5  2
...

The first column indicates the ID family, the second the ID individual, the third the sex of the individual.

I read the table as a dataframe

ped <- read.table("pedigree.ped", header=FALSE)

How I can compute the number of families exist (one family can appear more than one time and I want to consider them as one)? I have a sex column where 1 designate male and 2 female, how I can get the distribution of males and females in the data set?

I'm newbie to R, if you can give some code!

Thanks in advanced.

2
post a sample of your data, please.Ferdinand.kraft
^^^this - head(ped)Nishanth
please give me indicesHocine Ben

2 Answers

2
votes

Since you are new to R, I would suggest looking into excel first. The operations you are asking for is fairly simple and can be done in excel.

If you want to use R then look into data.frame indexing, subsetting etc.

If you are familiar with SQL, look in to sqldf package

Number of families:

numFamilies <- length(unique(ped[,1]))

Number of males & females:

numMales <- sum(ped[,3] == 1)
numFemales <- sum(ped[,3] == 2)
2
votes

Try using this for exploring the data:

For family:
table(ped[,1])

For sex: 
table(ped[,3])