Recoding levels of factors

Question

I have following dataframe:

forStack
  AGE  BMI time          A         B      ID
 1  59 23.8    0     (0,75]  (4,14.9] 9000099
 2  69 29.8    0 (96.4,100]  (-Inf,0] 9000296
 3  71 22.7    0  (75,89.3]  (4,14.9] 9000622
 4  56 32.4    0     (0,75] (14.9,68] 9000798
 5  72 30.7    0     (0,75] (14.9,68] 9001104
 6  75 23.5    0 (96.4,100]     (0,4] 9001400

dput (forStack)
structure(list(AGE = c(59, 69, 71, 56, 72, 75), BMI = c(23.8, 
29.8, 22.7, 32.4, 30.7, 23.5), time = c(0, 0, 0, 0, 0, 0), A = structure(c(2L, 
5L, 3L, 2L, 2L, 5L), .Label = c("(-Inf,0]", "(0,75]", "(75,89.3]", 
"(89.3,96.4]", "(96.4,100]", "(100, Inf]"), class = "factor"), 
B = structure(c(3L, 1L, 3L, 4L, 4L, 2L), .Label = c("(-Inf,0]", 
"(0,4]", "(4,14.9]", "(14.9,68]", "(68, Inf]"), class = "factor"), 
ID = c(9000099, 9000296, 9000622, 9000798, 9001104, 9001400
)), .Names = c("AGE", "BMI", "time", "A", "B", "ID"), row.names = c(NA, 
6L), class = "data.frame")

Variables A and B are factors representing quartiles:

   forStack$A
   [1] (0,75]     (96.4,100] (75,89.3]  (0,75]     (0,75]     (96.4,100]
   Levels: (-Inf,0] (0,75] (75,89.3] (89.3,96.4] (96.4,100] (100, Inf]

   forStack$B
   [1] (4,14.9]  (-Inf,0]  (4,14.9]  (14.9,68] (14.9,68] (0,4]    
   Levels: (-Inf,0] (0,4] (4,14.9] (14.9,68] (68, Inf]

I would like to recode A and B values to two-level factors as follows:

For A, the upper factor levels (96.4,100] and (100, Inf] should be recoded as 0 level, other levels - as 1 level

For B the the lowest factor levels (-Inf,0] and (0,4] should be recoded as 0 level, other levels - as 1 level

Thus, the dataframe should look like:

 forStack
  AGE  BMI time          A         B      ID
 1  59 23.8    0         1         1   9000099
 2  69 29.8    0         0         0   9000296
 3  71 22.7    0         1         1   9000622
 4  56 32.4    0         1         1   9000798
 5  72 30.7    0         1         1   9001104
 6  75 23.5    0         0         0   9001400

What is the most efficient way to do it? Thank you very much in advance

A5C1D2H2I1M1N2O1R2T1 A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2013-04-29T05:47:54

Here's one approach:

within(forStack, {
  A <- as.numeric(!A %in% tail(levels(A), 2))
  B <- as.numeric(!B %in% head(levels(B), 2))
})
#   AGE  BMI time A B      ID
# 1  59 23.8    0 1 1 9000099
# 2  69 29.8    0 0 0 9000296
# 3  71 22.7    0 1 1 9000622
# 4  56 32.4    0 1 1 9000798
# 5  72 30.7    0 1 1 9001104
# 6  75 23.5    0 0 0 9001400

The basic idea here is that head and tail both have an "n" argument that lets you specify how many values you want from the "head" and "tail" of your vector or dataset. That lets us easily grab (96.4,100] and (100, Inf] for vector A, and the relevant values for vector B.

within is a convenient way to dynamically replace the values in your data.frame.

Recoding levels of factors

2 Answers