0
votes

I have decided to learn R and am going through Introduction to Scientific programming in R book (http://www.ms.unimelb.edu.au/spuRs/)

I am currently stuck on chapter 7 question 3 of the book, the question is:

Consider the following very simple genetic model. A population consists of equal numbers of two sexes: male and female. At each generation men and women are paired at random, and each pair produces exactly two offspring, one male and one female. We are interested in the distribution of height from one generation to the next. Suppose that the height of both children is just the average of the height of their parents, how will the distribution of height change across generations?

Represent the heights of the current generation as a dataframe with two variables, m and f, for the two sexes. The command rnorm(100, 160, 20) will generate a vector of length 100, according to the normal distribution with mean 160 and standard deviation 20 (see Section 16.5.1). We use it to randomly generate the population at generation 1:

pop <- data.frame(m = rnorm(100, 160, 20), f = rnorm(100, 160, 20))

The command sample(x, size = length(x)) will return a random sample of size size taken from the vector x (without replacement). (It will also sample with replacement, if the optional argument replace is set to TRUE.) The following function takes the dataframe pop and randomly permutes the ordering of the men. Men and women are then paired according to rows, and heights for the next generation are calculated by taking the mean of each row. The function returns a dataframe with the same structure, giving the heights of the next generation.

next.gen <- function(pop) {
pop$m <- sample(pop$m)
pop$m <- apply(pop, 1, mean)
pop$f <- pop$m
return(pop)
}

Use the function next.gen to generate nine generations, then use the lattice function histogram to plot the distribution of male heights in each generation, as in Figure 7.7. The phenomenon you see is called regression to the mean.

Hint: construct a dataframe with variables height and generation, where each row represents a single man.

I have constructed a blank data frame:

generations <- data.frame(gen="", height="")

For now I am trying to get just the first generation height information into it, so I run:

next.gen(pop)

generations$height <- pop$m

and I get the following error:

Error in `$<-.data.frame`(`*tmp*`, "height", value = c(165.208323681597,  : 
replacement has 100 rows, data has 1

I understand that I'm trying to squeeze in information from pop$m dataframe into a single row of generations$height and that is causing the problem, I do not know how to fix this? I thought that a blank data frame is flexible enough to add rows as they are being copied from pop data frame?

I tried then to run this code:

generations <- pop$m

And I get 100 values but that just turns my generations dataframe into a vector I think and running

generations

Just lists the values copied in a vector only.

I think I am approaching the first step wrong, is my dataframe definition correct? Why can't I copy row information from 1 data frame into an empty one and just adjust the size of the empty data frame as needed?

Thank you

2

2 Answers

0
votes

Unsure the exact output you are looking for. Here is an approach which should be simple enough to follow. ** Note: There are workable approaches aplenty.

pop <- data.frame(m = rnorm(100, 160, 20), f = rnorm(100, 160, 20))

next.gen <- function(pop) {
  pop$m <- sample(pop$m)
  pop$m <- apply(pop, 1, mean)
  pop$f <- pop$m
  return(pop)
}

# the code
test <- list()
for (i in 1:9) {
  test[[i]] <- next.gen(pop)["m"]
  test[[i]]$generation <- paste0("g", i)
}
library(data.table)
test2 <- rbindlist(test)


# result
            m generation
  1: 174.6558         g1
  2: 143.2617         g1
  3: 185.2829         g1
  4: 168.9719         g1
  5: 151.6948         g1
 ---                    
896: 159.6091         g9
897: 161.4546         g9
898: 171.8679         g9
899: 138.4982         g9
900: 152.7390         g9
0
votes

Try:

> generations <- data.frame(gen="", height="", stringsAsFactors=F)
> for(i in 1:length(pop$m)) generations[i,] = c("",pop$m[i])
> generations
    gen           height
1        136.70042632318
2       153.985392293761
3       122.077485676327
4       166.582538529591
5       170.751368839498
6         190.8894492681
...