Consider the data frame
a = c(0, 1, 3, 5, 6, 0, 1, 3, 6, 12)
b = c(letters[5:9], letters[2:6])
c = data.frame(var1 = a, var2 = b)
I want to convert all values in the data frame to consecutive integers factor levels starting from 1 and use these as numeric values to compute something (in reality I don't do this for the letters but I just added these to explain my problem ;) ).
With some help (Converting numeric values of multiple columns to factor levels that are consecutive integers in (descending) order), I did this through:
c[] = lapply(c, function(x) {levels(x) <- 1:length(unique(x)); x})
Unfortunately, this only replaces the values with their respective factor levels for the character column var2 but not the for the numeric column var1 (notice the 0 in column var1)
> c
var1 var2
1 0 4
2 1 5
3 3 6
4 5 7
...
To alleviate the problem I converted all columns to character when creating c
c = as.data.frame(sapply(data.frame(var1 = a, var2 = b), as.character))
This yields
var1 var2
1 1 4
2 2 5
3 4 6
4 5 7
5 6 8
6 1 1
7 2 2
8 4 3
9 6 4
10 3 5
The problem here, however, is that the value 12 (c[10,'var1']) in column var1 is considered as the 3rd value (it gets assigned factor level 3 after levels 1 and 2 for values 0 and 1) rather than the last value (factor level 6 because it is the largest numeric value in var1).
Is there a way to assign factor levels on the basis of the numeric ordering at the same time replacing the numeric values by their factor levels?
c = data.frame(var1 = str_pad(a, 2, pad = "0"), var2 = b). Are there any cleaner solutions? - koteletje