9
votes

Hopefully I'm not duplicating some previously existing issue. I'm working on a 32-bit Win7 machine, R V=3.2.0, dplyr V=0.4.1, RStudio 0.98.1103.

The files in question are two CSV files read into vars (x,y / sep = "|", header = TRUE, stringsasFactors = FALSE), that originated from the same Oracle table. The query used to produce both files pulled the exact same variables (29 of).

identical(names(x), names(y) > TRUE

However, when I load the dplyr package and attempt to use 'bind_rows" as dat <- bind_rows(x, y) I get the following error:

> bind_rows(x,y)
Error: incompatible type (data index: 2, column: 'rmnumber', was collecting: integer (dplyr::Collecter_Impl<13>), incompatible with data of type: factor
In addition: Warning messages:
1: In rbind_all(list(x, ...)) :
  Unequal factor levels: coercing to character
2: In rbind_all(list(x, ...)) :
  Unequal factor levels: coercing to character
3: In rbind_all(list(x, ...)) :
  Unequal factor levels: coercing to character

I looked at the column 'rmnumber' and verified that everything in that column is either a numeric as expected or "NA", also as expected for NULL values in the table. I also tried bind_rows(list(x,y)) and it returned the same error.

The primitive "rbind" works just fine on these variables with no noticeable loss of precision.

Has anyone seen this error? Do you have any potential solutions outside of using rbind?

Thanks!

#

I don't think this is helpful but I constructed my own dfs and of course 'bind_rows' worked just perfectly:

> x.df <- data.frame(first_name = c("abc"), last_name = c("def"), rmnum = (1:15), addy = ("some_address"))
> y.df <- data.frame(first_name = c("abc"), last_name = c("def"), rmnum = (1:15), addy = ("some_address"))
> bind_rows(x.df, y.df)
Source: local data frame [30 x 4]

   first_name last_name rmnum         addy
1         abc       def     1 some_address
2         abc       def     2 some_address
3         abc       def     3 some_address
4         abc       def     4 some_address
5         abc       def     5 some_address
6         abc       def     6 some_address
7         abc       def     7 some_address
8         abc       def     8 some_address
9         abc       def     9 some_address
10        abc       def    10 some_address
..        ...       ...   ...          ...

Verifying class of cols

> identical(sapply(x, class), sapply(y, class))
[1] FALSE

> class(x$rmnumber);class(y$rmnumber)
[1] "integer"
[1] "character"

What I cannot figure out is why they are different. The information came out of the exact same table and they were read into variables using the exact same code.

Locking in the solution

Big thanks to @Pascal for helping me solve this. A simple data type conversion solved my issue:

    y$rmnumber <- as.integer(y$rmnumber)
> dat2 <- bind_rows(x,y)
> dat2
Source: local data frame [99,884 x 24]
1
Would you be able to provide your data? Otherwise it will be hard for others to see what is going on. :)jazzurro
Let me see if I can come up with a similar reproducible example. Sorry for not including.Zach
rbindlist from data.table handles factors cold with unequal levels and character cols automatically... Might be worth checking it out.Arun
@arun, excellent comment. You should add it as an answer. It's a much better solution for some people (like me) who are dealing with a tonne of variables that are mismatched.Brandon Bertelsen

1 Answers

9
votes

The error messages says that: "in one data.frame, 'rmnumber' in of class integer and in the other data.frame, 'rmnumber' is of class factor. I cannot bind different classes together".

Let's use your example

x.df <- data.frame(first_name = c("abc"), last_name = c("def"), rmnum = (1:15), addy = ("some_address"))
y.df <- data.frame(first_name = c("abc"), last_name = c("def"), rmnum = (1:15), addy = ("some_address"))

We check the class for each column of "x.df" and "y.df":

sapply(x.df, class)
# first_name  last_name      rmnum       addy 
#  "factor"   "factor"  "integer"   "factor" 


sapply(y.df, class)
# first_name  last_name      rmnum       addy 
#  "factor"   "factor"  "integer"   "factor" 

All is fine, the classes between data.frames are consistent. Now, let's turn "y.df$rmnum" into factor:

y.df$rmnum <- factor(y.df$rmnum)
class(y.df$rmnum)
# [1] "factor"

Let's try to bind now:

bind_rows(x.df, y.df)

Error: incompatible type (data index: 2, column: 'rmnum', was collecting: integer (dplyr::Collecter_Impl<13>), incompatible with data of type: factor

Same error message. So, in one of your data.frame, 'rmnumber' is integer and in the other one, 'rmnumber' is a factor. You have to turn the factored 'rmnumber' into integer, or the opposite.