0
votes

When I create a string column using data.table, using the data.frame parameter stringsAsFactor = F, the resulting data.table uses stringsAsFactor = F parameter correctly, but then the adds an extra column "stringsAsFactor". It is easy enough to get rid of the extra column. But is there a way to tell data.frame not to add columns based on the data.frame parameter? I.e., is this a bug or a feature? See ToyExample below:

library(data.table)
factorTest <- sample(c('O','A', 'B','AB'), 50, replace = T)
summary(factorTest)
   Length     Class      Mode 
       50 character character 
summary(as.factor(factorTest))
 A AB  B  O 
10 18  7 15 
test1 <- data.frame(dabo = factor(factorTest, 
     levels = c('O','A','B','AB')), dabostr = factorTest, 
     stringsAsFactors = F)
test2 <- data.table(dabo = factor(factorTest, 
     levels = c('O','A','B','AB')), dabostr = factorTest, 
     stringsAsFactors = F)
summary(test1)
 dabo      dabostr         
 O :15   Length:50         
 A :10   Class :character  
 B : 7   Mode  :character  
 AB:18                     
summary(test2)
 dabo      dabostr          stringsAsFactors
 O :15   Length:50          Mode :logical   
 A :10   Class :character   FALSE:50        
 B : 7   Mode  :character   NA's :0         
 AB:18                    
1
data.table simply don't have the stringsAsFactors argument- see ?data.table. So you are basically just creating a new column. The reason the strings aren't converting to factors like data.frame is because it the default data.table behavior. - David Arenburg
I've filled feature request to handle that or raise warning: data.table#1446 - jangorecki

1 Answers

1
votes

This was fixed in commit 3dbc493 and now data.table() has fully functional stringAsFactors argument.
When TRUE it will use fast internal as.factor function, as the base factor() is slow.
Below your code reproducible on latest data.table 1.9.7.

library(data.table)
factorTest <- sample(c('O','A', 'B','AB'), 50, replace = T)
test1 <- data.frame(dabo = factor(factorTest, 
     levels = c('O','A','B','AB')), dabostr = factorTest, 
     stringsAsFactors = F)
test2 <- data.table(dabo = factor(factorTest, 
     levels = c('O','A','B','AB')), dabostr = factorTest, 
     stringsAsFactors = F)
summary(test1)
# dabo      dabostr         
# O : 8   Length:50         
# A :10   Class :character  
# B :16   Mode  :character  
# AB:16                                   
summary(test2)
# dabo      dabostr         
# O : 8   Length:50         
# A :10   Class :character  
# B :16   Mode  :character  
# AB:16