0
votes

I'm trying to extract and organize the values from the first data frame into the second. In the first you have cbn which is a factor that lists combinations of variables 1 to 31 (example dataframe shows a portion of all my data). For each of these combinations A, B, and C have values 1 or 2.

                 cbn             A             B             C
    1     1, 2, 3, 4             1             2             1
    2     1, 2, 3, 5             1             1             1
    3     1, 2, 3, 7             1             1             1
    4     1, 2, 3, 8             1             2             1
    5     1, 2, 3, 9             1             1             1
    6    1, 2, 3, 10             1             1             1
    7    1, 2, 3, 12             1             2             1
    8    1, 2, 3, 13             1             2             1
    9    1, 2, 3, 17             1             2             1
    10   1, 2, 3, 18             1             2             1
    11   1, 2, 3, 20             2             2             2
    12   1, 2, 3, 22             1             2             1
    13   1, 2, 3, 23             1             2             1
    14   1, 2, 3, 25             1             2             1
    15   1, 2, 3, 26             1             2             1
    16   1, 2, 3, 28             1             2             1      
    17   1, 2, 3, 29             1             2             1
    18   1, 2, 3, 30             1             2             1
    19   1, 2, 3, 31             1             2             1

I'm trying to get all that data into a new dataframe. The rows become the 31 variables, and the columns become separated into 1 and 2 for A,B, and C. For every row in df1, the variables used in the combination are separated and added to the corresponding row in df2 under the column with the letter and value indicated in the df1. Thus, the first line in df1 has variables 1, 2, 3, and 4, and A is 1. In df2 under the A1 column, 1 is added to each corresponding variable row. For each variable present under cbn in df1, 1 is added to the count for that variable in df2 under letter with the same value in df1. I have added the first two rows of df1 to df2.

      Variable         A1            A2            B1            B2            C1            C2        
    1     1             2             0             1             1             2             0 
    2     2             2             0             1             1             2             0 
    3     3             2             0             1             1             2             0 
    4     4             1             0             0             1             1             0 
    5     5             1             0             1             0             1             0 
    6     6             0             0             0             0             0             0 
    7     7             0             0             0             0             0             0 
    8     8             0             0             0             0             0             0 
    9     9             0             0             0             0             0             0 
    10   10             0             0             0             0             0             0 
    11   11             0             0             0             0             0             0 
    12   12             0             0             0             0             0             0 
    13   13             0             0             0             0             0             0 
    14   14             0             0             0             0             0             0 
    15   15             0             0             0             0             0             0 
    16   16             0             0             0             0             0             0
    ...  ...           ...           ...            ...          ...           ...           ...
    31   31             0             0             0             0             0             0

How can I transfer this data into df2?

1
Can you use dput to show the data as it is not clear whether cba is list or stringakrun
@akrun Not sure if this is what you are looking for but this I did dput(df1$cbn) "9, 24, 29, 30", "9, 24, 29, 31", "9, 24, 30, 31", "9, 25, 26, 28", "9, 25, 26, 29", "9, 25, 26, 30", "9, 25, 26, 31", "9, 25, 28, 29", "9, 25, 28, 30", "9, 25, 28, 31", "9, 25, 29, 30", "9, 25, 29, 31", "9, 25, 30, 31", "9, 26, 28, 29", "9, 26, 28, 30", "9, 26, 28, 31", "9, 26, 29, 30", "9, 26, 29, 31", "9, 26, 30, 31", "9, 28, 29, 30", "9, 28, 29, 31", "9, 28, 30, 31", "9, 29, 30, 31"), class = "factor")bobbytreefish
It is not clear to me how you get A1, A2 from A ?Ronak Shah
@Ronak Shah in df1, A can only be 1 or 2. I wish to see what combinations, what variables, contribute to A being 1 or 2 most often.bobbytreefish
So why is A1 2 and A2 0 ?Ronak Shah

1 Answers

1
votes

Using you first two rows of data:

df1 <- data.frame(cbn = c("1, 2, 3, 4", "1, 2, 3, 5" ),
                  A = c(1,1), 
                  B = c(2,1), 
                  C = c(1,1))

First add the letters to the entries of the column:

for(x in c("A","B","C")){
  df1[,x] <- paste0(x, df1[,x])
}

Then using sperate to split the cbn column in to multiple columns and using gather, summarize and then spread:

library(tidyverse)
df2 <- df1 %>% 
  separate(cbn , paste("V",1:4), sep = ",") %>% 
  gather("dummy", "Variable", starts_with("V")) %>% 
  mutate(Variable = as.numeric(Variable))%>% 
  select(-dummy) %>% 
  gather("dummy", "value", -Variable) %>% 
  select(-dummy) %>% 
  mutate(value = factor(value, levels = c("A1","A2","B1","B2","C1","C2"))) %>% 
  group_by(Variable, value) %>% 
  summarize(n = n()) %>% 
  spread("value", "n", fill = 0, drop = F) %>%
  as.data.frame()

results in:

> df2

  Variable A1 A2 B1 B2 C1 C2
1        1  2  0  1  1  2  0
2        2  2  0  1  1  2  0
3        3  2  0  1  1  2  0
4        4  1  0  0  1  1  0
5        5  1  0  1  0  1  0