2
votes

I have question about numbering the groups in a data.frame.

I found only one similar approach here dplyr-how-to-number-label-data-table-by-group-number-from-group-by

but it didnt worked to me. I dont know why.

S <- rep(letters[1:12],each=6)
R = sort(replicate(9, sample(5000:6000,4)))
df <- data.frame(R,S)

get_next_integer = function(){
  i = 0
  function(S){ i <<- i+1 }
}
get_integer = get_next_integer() 

result <- df %>% group_by(S) %>% mutate(label = get_integer())
result

Source: local data frame [72 x 3]
Groups: S [12]

       R      S label
   (int) (fctr) (dbl)
1   5058      a     1
2   5121      a     1
3   5129      a     1
4   5143      a     1
5   5202      a     1
6   5213      a     1
7   5239      b     1
8   5245      b     1
9   5269      b     1
10  5324      b     1
..   ...    ...   ...

I look for elegant solution in dplyr. Numbering each letters from 1 to 12 etc.

2
Is there a reason to do this in dplyr? df$label <- as.numeric(factor(df$S))hrbrmstr
@Frank, how is df$label <- group_indices(df, S) useless?hrbrmstr
actually, that's not the whole point of the package. chaining is a nice additional component but the whole point of the pkg was to provide a more standardized and sane way of doing data frame machinations.hrbrmstr
@hrbrmstr Fair enough. Cleaning up my comments. One other common way: match(df$S, unique(df$S))Frank
@hrbrmstr - can you explain why df %>% group_indices(S) works fine but df %>% mutate(label=group_indices(S)) fails? I can't for the life of me figure why it should just not work.thelatemail

2 Answers

6
votes

Using as.numeric will do the trick.

S <- rep(letters[1:12],each=6)
R = sort(replicate(9, sample(5000:6000,4)))
df <- data.frame(R,S)

result <- df %>% mutate(label = as.numeric(S)) %>% group_by(S)

result
Source: local data frame [72 x 3]
Groups: S

      R S label
1  5018 a     1
2  5042 a     1
3  5055 a     1
4  5066 a     1
5  5081 a     1
6  5133 a     1
7  5149 b     2
8  5191 b     2
9  5197 b     2
10 5248 b     2
..  ... .   ...
4
votes

No need to use dplyr at all.

S <- rep(letters[1:12],each=6)
R = sort(replicate(9, sample(5000:6000,4)))
df <- data.frame(R,S)

df$label <- as.numeric(factor(df$S))