From long to wide formats just based on two columns Rstudio

Question

This is my data frame:

I have a data frame of six columns and last columns contains the values . The Column 'code' includes s and d. column 'Sex' includes M and F. And I have two thousand offsprings in the column offspring.

seq parent code Sex offspring                     Value 

1   49032   s   M   J44010_CCG7YANXX_2_661_X4   -0.38455056

2   48741   s   M   J44010_CCG7YANXX_2_661_X4   0.10574340

3   48757   s   M   J44010_CCG7YANXX_2_661_X4   0.39572906

4   48465   d   f   J44010_CCG7YANXX_2_661_X4   0.43409006

5   48521   d   f   J44010_CCG7YANXX_2_661_X4   0.40337447

6   48703   d   f   J44010_CCG7YANXX_2_661_X4   -0.38148980

The column parent includes ids for both males and females. I want to keep the female/dam id ,female/dam code and female/dam sex just beside the male/sire as a column and also keep the sire value and dam value seperately . So, the 'value' will be seprated in two parts .

The data frame will look like the below:

'seq''parent1''sirecode''Sex''parent2''damcode''Sex''offspring''sireValue' 'damvalue'

  1    49032      s       M    48465     d       f    J44010  -0.38455056  0.43409006

  2    48741      s       M    48521     d       f    J44010   0.10574340   0.40337447

  3    48757      s       M    48703     d       f    J44010   0.39572906   -0.38148980

So, each offspring will have 3 or 4 pair of parents.
I tried to use dcast function on it.

How do we know what male parent to match to what female parents? All the offspring are identical as far as I can tell. — iod
I just given the example of one offspring. There are other offspring just like it . And male parent (sire1) and female parent (dam1) are in pair. So, they are sequenced . For example, 1. Sire 1 2. Sire 2 3. Sire 3. 4. Dam1 5. Dam2 6. Dam3 — Koushik Das

akrun akrun · Accepted Answer · 2018-11-23T01:25:14

We could use dcast after creating a sequence column

library(data.table)
setDT(df1)[, n := seq_len(.N), .(code, Sex)]
dcast(df1, n + offspring ~ rowid(n), value.var = c('parent', 'code', 'Sex', 'Value'), sep = "")
#  n                 offspring parent1 parent2 code1 code2 Sex1 Sex2     Value1     Value2
#1: 1 J44010_CCG7YANXX_2_661_X4   49032   48465     s     d    M    f -0.3845506  0.4340901
#2: 2 J44010_CCG7YANXX_2_661_X4   48741   48521     s     d    M    f  0.1057434  0.4033745
#3: 3 J44010_CCG7YANXX_2_661_X4   48757   48703     s     d    M    f  0.3957291 -0.3814898

In base R, we can use reshape

df1$n <- with(df1, ave(seq_along(Sex), Sex, FUN = seq_along))
df1$n1 <- with(df1, ave(n, n, FUN = seq_along))
reshape(df1[-1], idvar = c('n', 'offspring'), timevar = 'n1', direction = 'wide' )

data

df1 <- structure(list(seq = 1:6, parent = c(49032L, 48741L, 48757L, 
48465L, 48521L, 48703L), code = c("s", "s", "s", "d", "d", "d"
), Sex = c("M", "M", "M", "f", "f", "f"), 
  offspring = c("J44010_CCG7YANXX_2_661_X4", 
"J44010_CCG7YANXX_2_661_X4", "J44010_CCG7YANXX_2_661_X4", 
  "J44010_CCG7YANXX_2_661_X4", 
"J44010_CCG7YANXX_2_661_X4", "J44010_CCG7YANXX_2_661_X4"), 
   Value = c(-0.38455056, 
0.1057434, 0.39572906, 0.43409006, 0.40337447, -0.3814898)),
 class = "data.frame", row.names = c(NA, -6L))

From long to wide formats just based on two columns Rstudio

1 Answers

data