1
votes

I'm stuck on an R data wrangling issue. I have a data frame (df) that looks like this:

      loc     lat      long      group
 1    loc1    47.69119 -91.85776  A
 2    loc1    47.69119 -91.85776  B
 3    loc1    47.69119 -91.85776  C
 4    loc2    46.41079 -86.64089  C
 5    loc2    46.41079 -86.64089  C
 6    loc2    46.41079 -86.64089  C
 7    loc3    47.33673 -91.19876  D
 8    loc4    46.83937 -87.71222  A
 9    loc4    46.83937 -87.71222  E
 10   loc4    46.83937 -87.71222  E

What I would like to do is create a new column for each unique value in df$group, then use the number of times each group is observed for each loc to populate the new columns, and retain only a single row for each loc. I would like the final product to look like this:

      loc    lat      long       A  B  C  D  E
 1    loc1   47.69119 -91.85776  1  1  1  0  0
 2    loc2   46.41079 -86.64089  0  0  3  0  0
 3    loc3   47.33673 -91.19876  0  0  0  1  0
 4    loc4   46.83937 -87.71222  1  0  0  0  2

I think I need some combination of dplyr and/or tidyr functions, but haven't been able to work it out. I've tabulated the frequency for each loc and group using the following code:

df.freq = df %>%
group_by(loc, group) %>%
summarise(Freq = n())

Perhaps the resulting df.freq can be used to create the new columns using tidyr as follows:

tidyr::separate(df.freq, group, 
  as.character(unique(df.freq$group)))

But then I'm not sure how to populate each new column and retain only a single row for each loc.

1

1 Answers

3
votes

One option is to get the count based on the columns 'loc', 'lat', 'long', 'group' and then spread to 'wide' format

library(tidyverse)
df %>% 
  count(loc, lat, long, group) %>%
  spread(group, n, fill = 0) 

A data.table version (@markus comments) would be

library(data.table)
dcast(setDT(df), loc + lat + long ~ group)