117
votes

In an answer to another question, @Marek posted the following solution: https://stackoverflow.com/a/10432263/636656

dat <- structure(list(product = c(11L, 11L, 9L, 9L, 6L, 1L, 11L, 5L, 
                                  7L, 11L, 5L, 11L, 4L, 3L, 10L, 7L, 10L, 5L, 9L, 8L)), .Names = "product", row.names = c(NA, -20L), class = "data.frame")

`levels<-`(
  factor(dat$product),
  list(Tylenol=1:3, Advil=4:6, Bayer=7:9, Generic=10:12)
  )

Which produces as output:

 [1] Generic Generic Bayer   Bayer   Advil   Tylenol Generic Advil   Bayer   Generic Advil   Generic Advil   Tylenol
[15] Generic Bayer   Generic Advil   Bayer   Bayer  

This is just the printout of a vector, so to store it you can do the even more confusing:

res <- `levels<-`(
  factor(dat$product),
  list(Tylenol=1:3, Advil=4:6, Bayer=7:9, Generic=10:12)
  )

Clearly this is some kind of call to the levels function, but I have no idea what's being done here. What is the term for this kind of sorcery, and how do I increase my magical ability in this domain?

4
There is also names<- and [<-.huon
Also, I wondered about this on the other question but didn't ask: is there any reason for the structure(...) construct instead of just data.frame(product = c(11L, 11L, ..., 8L))? (If there's some magic happening there, I'd like to wield it too!)huon
It's a call to the "levels<-" function: function (x, value) .Primitive("levels<-"), sort of like X %in% Y is an abbreviation for "%in%"(X, Y).BenBarnes
@dbaupp Very handy for reproducible examples: stackoverflow.com/questions/5963269/…Ari B. Friedman
I have no idea why someone voted to close this as not constructive? The Q has a very clear answer: what is the meaning of the syntax used in the example and how does this work in R?Gavin Simpson

4 Answers

108
votes

The answers here are good, but they are missing an important point. Let me try and describe it.

R is a functional language and does not like to mutate its objects. But it does allow assignment statements, using replacement functions:

levels(x) <- y

is equivalent to

x <- `levels<-`(x, y)

The trick is, this rewriting is done by <-; it is not done by levels<-. levels<- is just a regular function that takes an input and gives an output; it does not mutate anything.

One consequence of that is that, according to the above rule, <- must be recursive:

levels(factor(x)) <- y

is

factor(x) <- `levels<-`(factor(x), y)

is

x <- `factor<-`(x, `levels<-`(factor(x), y))

It's kind of beautiful that this pure-functional transformation (up until the very end, where the assignment happens) is equivalent to what an assignment would be in an imperative language. If I remember correctly this construct in functional languages is called a lens.

But then, once you have defined replacement functions like levels<-, you get another, unexpected windfall: you don't just have the ability to make assignments, you have a handy function that takes in a factor, and gives out another factor with different levels. There's really nothing "assignment" about it!

So, the code you're describing is just making use of this other interpretation of levels<-. I admit that the name levels<- is a little confusing because it suggests an assignment, but this is not what is going on. The code is simply setting up a sort of pipeline:

  • Start with dat$product

  • Convert it to a factor

  • Change the levels

  • Store that in res

Personally, I think that line of code is beautiful ;)

34
votes

No sorcery, that's just how (sub)assignment functions are defined. levels<- is a little different because it is a primitive to (sub)assign the attributes of a factor, not the elements themselves. There are plenty of examples of this type of function:

`<-`              # assignment
`[<-`             # sub-assignment
`[<-.data.frame`  # sub-assignment data.frame method
`dimnames<-`      # change dimname attribute
`attributes<-`    # change any attributes

Other binary operators can be called like that too:

`+`(1,2)  # 3
`-`(1,2)  # -1
`*`(1,2)  # 2
`/`(1,2)  # 0.5

Now that you know that, something like this should really blow your mind:

Data <- data.frame(x=1:10, y=10:1)
names(Data)[1] <- "HI"              # How does that work?!? Magic! ;-)
31
votes

The reason for that "magic" is that the "assignment" form must have a real variable to work on. And the factor(dat$product) wasn't assigned to anything.

# This works since its done in several steps
x <- factor(dat$product)
levels(x) <- list(Tylenol=1:3, Advil=4:6, Bayer=7:9, Generic=10:12)
x

# This doesn't work although it's the "same" thing:
levels(factor(dat$product)) <- list(Tylenol=1:3, Advil=4:6, Bayer=7:9, Generic=10:12)
# Error: could not find function "factor<-"

# and this is the magic work-around that does work
`levels<-`(
  factor(dat$product),
  list(Tylenol=1:3, Advil=4:6, Bayer=7:9, Generic=10:12)
  )
17
votes

For user-code I do wonder why such language manipulations are used so? You ask what magic is this and others have pointed out that you are calling the replacement function that has the name levels<-. For most people this is magic and really the intended use is levels(foo) <- bar.

The use-case you show is different because product doesn't exist in the global environment so it only ever exists in the local environment of the call to levels<- thus the change you want to make does not persist - there was no reassignment of dat.

In these circumstances, within() is the ideal function to use. You would naturally wish to write

levels(product) <- bar

in R but of course product doesn't exist as an object. within() gets around this because it sets up the environment you wish to run your R code against and evaluates your expression within that environment. Assigning the return object from the call to within() thus succeeds in the properly modified data frame.

Here is an example (you don't need to create new datX - I just do that so the intermediary steps remain at the end)

## one or t'other
#dat2 <- transform(dat, product = factor(product))
dat2 <- within(dat, product <- factor(product))

## then
dat3 <- within(dat2, 
               levels(product) <- list(Tylenol=1:3, Advil=4:6, 
                                       Bayer=7:9, Generic=10:12))

Which gives:

> head(dat3)
  product
1 Generic
2 Generic
3   Bayer
4   Bayer
5   Advil
6 Tylenol
> str(dat3)
'data.frame':   20 obs. of  1 variable:
 $ product: Factor w/ 4 levels "Tylenol","Advil",..: 4 4 3 3 2 1 4 2 3 4 ...

I struggle to see how constructs like the one you show are useful in the majority of cases - if you want to change the data, change the data, don't create another copy and change that (which is all the levels<- call is doing after all).