I have a dummy matrix like:
df1 = data.frame(a=c(1,1,2,2,3,3))
df1$a = as.factor(df1$a)
library(caret)
d <- dummyVars(~ ., data = df1, levelsOnly = TRUE, na.action=na.omit)
predict(d, df1)
1 2 3
1 1 0 0
2 1 0 0
3 0 1 0
4 0 1 0
5 0 0 1
6 0 0 1
Now I need to map new data - which may have addditional or missing factor levels - to the original dummy matrix (i.e. columns need to be the same).
When I try with predict()
new data with an additional level:
df2 = data.frame(a=c(1,1,3,3,4,4))
df2$a = as.factor(df2$a)
predict(d, df2)
I get an error:
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$lvls) : factor a has new levels 4
How can I populate the original dummy matrix with new data?
The desired outcome would be:
1 2 3
1 1 0 0
2 1 0 0
3 0 0 1
4 0 0 1
5 0 0 0
6 0 0 0
df2$a <- factor(df2$a, levels(df1$a))
instead ofdf2$a = as.factor(df2$a)
– GKiNA
with0
like:tt <- predict(d, df2); tt[is.na(tt)] <- 0
and you have your dummy matrix intt
. – GKi