0
votes

I frequently have factor variables with long text as levels (e.g. responses in a survey: "I strongly agree with the statement" could be one level). If I want to subset based on the factor variable it is annoying to always type the complete level. I would rather have an overview of the mapping from underlying integer to level and access based on this integer.

set.seed(1896)
df <- data.frame(y = runif(100, 0, 10000), x = factor(rep(c("I strongly agree", "I agree", "I disagree", "I strongly disagree"), 25)))
mean(df$y[df$x == "I strongly agree"])

It is possible to get the same result by accessing the underlying integer via:

mean(df$y[as.integer(df$x) == 3])

I have two related questions: 1) Is there a better/safer way of doing this? In Stata with values and value labels, it is possible to access either the label or the value directly, does something similar exist in R? 2) Is there a way to quickly see a table with the mapping from integer to factor level in R? A command that would give me such a table: 1 - "I agree"; 2 - "I disagree"; 3 - "I strongly agree"; 4 - "I strongly disagree"?

Thanks in advance!

1
For (2) what about just: levels(df$x)? - Dominic van Essen
...or as a more 'table'-like output: data.frame(levels(df$x)) - Dominic van Essen

1 Answers

2
votes

Not very sure about your question as I have never used Stata. The most important step is the factoring part. By default the levels are sorted alphabetically.

df = data.frame(y = runif(100, 0, 10000), x = rep(c("I strongly agree", "I agree", "I disagree", "I strongly disagree"), 25))

levels(df$x)
[1] "I agree"             "I disagree"          "I strongly agree"   
[4] "I strongly disagree"

To be safe, I guess you mean defining the same levels for each dataset. So you can do:

lvls = c("I strongly agree", "I agree", "I disagree", "I strongly disagree")
df$x = factor(df$x,levels=lvl)

levels(df$x)
[1] "I strongly agree"    "I agree"             "I disagree"         
[4] "I strongly disagree"

So for table, maybe something like:

data.frame(num = 1:length(lvl),lvl)
  num                 lvl
1  1    I strongly agree
2  2             I agree
3  3          I disagree
4  4 I strongly disagree

And you can subset using:

df[df$x==lvl[1],]

Or:

df[df$x==levels(df$x)[1],]