1
votes

It seems to be contrary to what is written in the help file:

"If a list or data frame or matrix is passed to data.frame it is as if each component or column had been passed as a separate argument"

So what do I do wrong ?

Example code:

d <- c("bla", "bla", "blou", "blou", "bli")
dtest <- data.frame(d, stringsAsFactors=FALSE)
dtest2 <- data.frame(dtest, stringsAsFactors=TRUE)
dtest3 <- data.frame(dtest[[1]], stringsAsFactors=TRUE)

str(c(dtest2, dtest3))

One is a character vector, the other has been converted to a factor (following the stringsAsFactor=TRUE behavior). They "should" be both factors.

I actually want to use data.frame(df) to convert an existing data frame with some character columns into a data frames with the corresponding factors.

1
I know that I could do the latter with something like: i <- sapply(df,is.character) df[i] <- lapply(df[i], as.factor) But the question of parsing the data frames is still raised!Antoine Lizée

1 Answers

2
votes

Doing this: dtest <- data.frame(d, stringsAsFactors=FALSE) does not set an attribute that prevents subsequent calls to data.frame from applying the default stringsAsFactors-behavior. You could achieve such a state of affairs by setting:

  options(stringsAsFactors=FALSE)

If on the other hand you were hoping for similar behavior, you should send the entire column to data.frame:

> dtest2 <- data.frame(dtest)
> dtest3 <- data.frame(dtest[1])
> 
> str(c(dtest2, dtest3))
List of 2
 $ d: chr [1:5] "bla" "bla" "blou" "blou" ...
 $ d: chr [1:5] "bla" "bla" "blou" "blou" ...

If you want to get all the columns of a dataframe re-evaluated then I suppose you could do this:

data.frame(lapply(dtest, as.vector) )

> str(data.frame(lapply(dtest, as.vector) ))
'data.frame':   5 obs. of  1 variable:
 $ d: Factor w/ 3 levels "bla","bli","blou": 1 1 3 3 2