0
votes

I am trying to dummy code data frame with mix(numeric + factor) variables. But, model.matrix won't be applicable for variables having levels than 2.

Sample data-

dt <- data.frame(A=c("1","1","1"),
                 B=c("0","1","1"),
                 C=c("5","6","7"),
                 id=c(1,2,3))

Desired output-

  A1 B0 B1 B2 C5 C6 C7 id
1  1  1  0  0  1  0  0  1
2  1  0  1  0  0  1  0  2
3  1  0  0  1  0  0  1  3

My Attempts-

dt_res <- model.matrix(~.+0,dt)

This works perfectly fine without constant variables. But, I have more than 1000 variables and it is not possible to subset and do it.

Is there any possible solution using dcast or melt or reshape.

1
mltools::one_hotDave Gruenewald
does it support factor variable less that 2 levels?Rushabh Patel

1 Answers

2
votes

Using data.table, you can melt first before casting it into the desired wide format:

library(data.table)
setDT(dt)
cols <- names(dt[, -"id"])
dcast(
    melt(dt[, c(.(id=id), lapply(cols, function(x) paste0(x, get(x))))], id.vars="id"), 
    id ~ value,
    length)

output:

   id A1 B0 B1 B2 C5 C6 C7
1:  1  1  1  0  0  1  0  0
2:  2  1  0  1  0  0  1  0
3:  3  1  0  0  1  0  0  1

data:

dt <- data.frame(A=c("1","1","1"),
    B=c("0","1","2"),
    C=c("5","6","7"),
    id=c(1,2,3))