1
votes

I have a data frame like this:

df <- data.frame(grouping = c(rep("site1_1",9), rep("site2_1",9)),
                 var = c(rep("P", 3), rep("G", 3), rep("B",3),rep("P", 3), rep("B", 3), rep("G",3)),
                 order= c(rep(0, 3), rep(1, 3), rep(2,3),rep(0, 3), rep(1, 3), rep(2,3)))

   grouping var order
1   site1_1   P     0
2   site1_1   P     0
3   site1_1   P     0
4   site1_1   G     1
5   site1_1   G     1
6   site1_1   G     1
7   site1_1   B     2
8   site1_1   B     2
9   site1_1   B     2
10  site2_1   P     0
11  site2_1   P     0
12  site2_1   P     0
13  site2_1   B     1
14  site2_1   B     1
15  site2_1   B     1
16  site2_1   G     2
17  site2_1   G     2
18  site2_1   G     2

I have a column that represents a unique ID called Grouping (never repeated). Within each grouping I have 3 variables (P,G, and B). They do get repeated multiple times within a grouping as shown above.

The order is always P,G,B or P,B,G. Within a group P is always 0, B/G are either 1 or 2. They can never be 1 and 2 within a group and between groups it's random as to whether B or G is 1 or 2.

I have an order column to show the order that var takes within each group

I would like to add a new column that labels the entire grouping (P,B, and G) based on whether B precedes G or vice versa.

This is an example of what that would look like:

   grouping var order  label
1   site1_1   P     0 Gfirst
2   site1_1   P     0 Gfirst
3   site1_1   P     0 Gfirst
4   site1_1   G     1 Gfirst
5   site1_1   G     1 Gfirst
6   site1_1   G     1 Gfirst
7   site1_1   B     2 Gfirst
8   site1_1   B     2 Gfirst
9   site1_1   B     2 Gfirst
10  site2_1   P     0 Bfirst
11  site2_1   P     0 Bfirst
12  site2_1   P     0 Bfirst
13  site2_1   B     1 Bfirst
14  site2_1   B     1 Bfirst
15  site2_1   B     1 Bfirst
16  site2_1   G     2 Bfirst
17  site2_1   G     2 Bfirst
18  site2_1   G     2 Bfirst

I am unclear as to how to do this.

Using dplyr I start by

df %>% group_by(grouping) %>% mutate(label = .......

But after here I'm lost as to how to specify that the label is conditional on the order of the sequence of P,B, and G and how to account for the fact that they repeat multiple times in each group.

I went to this exchange:

[How can I create a new column in a dataframe based on permutations of other columns?

but am unclear how to adopt the answers given that I need to group them by the groupings column and need to account for differing numbers of permutations within each variable (there can be a range from 3-15 P's, B's, and G's in each grouping.

Any help is greatly appreciated.

2

2 Answers

1
votes
library(tidyverse)

df %>% 
  group_by(grouping) %>% 
  mutate(label = paste0(substr(gsub("[^G|B]", "", paste(unique(var), collapse = "")), 1, 1), "first"))
1
votes

One solution using dplyr and ifelse can be achieved as:

library(dplyr)

df %>% group_by(grouping) %>%
  mutate(label = ifelse(var[var!="P"][1] == "B","BFirst","GFirst" )) %>%
  as.data.frame()

#    grouping var order  label
# 1   site1_1   P     0 GFirst
# 2   site1_1   P     0 GFirst
# 3   site1_1   P     0 GFirst
# 4   site1_1   G     1 GFirst
# 5   site1_1   G     1 GFirst
# 6   site1_1   G     1 GFirst
# 7   site1_1   B     2 GFirst
# 8   site1_1   B     2 GFirst
# 9   site1_1   B     2 GFirst
# 10  site2_1   P     0 BFirst
# 11  site2_1   P     0 BFirst
# 12  site2_1   P     0 BFirst
# 13  site2_1   B     1 BFirst
# 14  site2_1   B     1 BFirst
# 15  site2_1   B     1 BFirst
# 16  site2_1   G     2 BFirst
# 17  site2_1   G     2 BFirst
# 18  site2_1   G     2 BFirst