0
votes

So, I used dcast() on a dataframe last time in which one column was ID and the other multiple codes assigned per id. My df1 looked like this:

ID  codes    gfreq
123  FGV34     0.988
123  FGV34     0.988
123  FGV34     0.988 
566  WER45     na
566  FGV34      0.988
566  FGV34      0.988

in order to manipulate the above format into :

ID  FGV34  WER45
123  1       0
566  1       1

dcast(df1, ID ~ codes) 

And it had worked perfectly. Now, i have a similar dataframe df2, which has just TWO columns, ID and codes.

ID  codes    
123  FGV34     
123  FGV34    
123  FGV34     
566  WER45     
566  FGV34      
566  FGV34 

When I run it into dcast: 1. I get a warning about Value.var being overridden and codes column is used as value.var which is okay 2. The format in which I am getting the output is completely different this time.

ID  FGV34  WER45
123 FGV34    NA
566 FGV34  WER45

I have checked the datatypes of the attributes in df1 and df2. They are the same for both ID and codes. I want help in getting the output like before with either 0 or 1 instead of NA and column name. Secondly, I want to understand what changed for the dcast() to be behaving differently.

1
A tidr and dplyr solution might help. Try this df %>% filter(!is.na(Codes)) %>% spread(Codes, ID)deepseefan
No, I think it requires for the columns to not have any repetitive values.. But in my case ID repeats..savi
Do you have your data frame, I just tested with a repeat and it works.deepseefan
No luck. The following is the error that I'm getting.. "Error: Each row of output must be identified by a unique combination of keys. Keys are shared for 61 rows:"savi
Likely you have repeat in the Codes.deepseefan

1 Answers

1
votes

Both reshape2 and spread have been deprecated or retired - the tidyverse now wants you to use pivot_wider. I'm not up to date on that syntax, but dcast still does what you want it to with data.table.

library(data.table)
d1 <- data.table(ID = c(11,11,11,12,12,12), 
                 codes = c('a', 'a', 'a', 'b', 'a', 'a'), 
                 gfreq = c(.5,.5,.5,NA,.5,.5))
dcast(d1, ID ~ codes)
#> Using 'gfreq' as value column. Use 'value.var' to override
#> Aggregate function missing, defaulting to 'length'
#>    ID a b
#> 1: 11 3 0
#> 2: 12 2 1

d2 <- data.table(ID = c(11,11,11,12,12,12), 
                 codes = c('a', 'a', 'a', 'b', 'a', 'a'))
dcast(d2, ID ~ codes)
#> Using 'codes' as value column. Use 'value.var' to override
#> Aggregate function missing, defaulting to 'length'
#>    ID a b
#> 1: 11 3 0
#> 2: 12 2 1

## If you only want 1's and 0's
dcast(unique(d2), ID ~ codes, 
      fun.aggregate = length)
#> Using 'codes' as value column. Use 'value.var' to override
#>    ID a b
#> 1: 11 1 0
#> 2: 12 1 1

Created on 2019-10-16 by the reprex package (v0.3.0)