Elaborating on the answer from timcdlucas (and the comments from r2evans), the issue here is the behavior of various forms of the extract operator, not the behavior of tibble
. Why? a tibble
is actually a kind of data.frame
as illustrated when we use the str()
function on a tibble.
> library(dplyr)
> aTibble <- tibble(f1 = factor(rep(letters[1:3],5)),
+ c1 = rnorm(15))
>
> # illustrate that aTibble is actually a type of data frame
> str(aTibble)
tibble [15 × 2] (S3: tbl_df/tbl/data.frame)
$ f1: Factor w/ 3 levels "a","b","c": 1 2 3 1 2 3 1 2 3 1 ...
$ c1: num [1:15] -0.5829 0.3682 1.1854 -0.6309 -0.0268 ...
There are four forms of the extract operator in R: [
, [[
, $
, and @
; as noted in What is the meaning of the dollar sign $ in R function?.
The first form, [
can be used to extract content form vectors, lists, matrices, or data frames. When used with a data frame (or tibble in the tidyverse), it returns an object of type data.frame
or tibble
unless the drop = TRUE
argument is included, as noted in the question comments by r2evans.
Since the default setting of drop=
in the [
function is FALSE
, it follows that df[,"f1"]
produces an unexpected or "wrong" result for the code posted with the original question.
library(dplyr)
aTibble <- tibble(f1 = factor(rep(letters[1:3],5)),
c1 = rnorm(15))
# produces unexpected answer
nlevels(aTibble[,"f1"])
> nlevels(aTibble[,"f1"])
[1] 0
The drop =
argument is used when extracting from matrices or arrays (i.e. any object that has a dim
attribute, as explained in help for the drop() function.
> dim(aTibble)
[1] 15 2
>
When we set drop = TRUE
, the extract function returns an object of the lowest type available, that is all extents of length 1 are removed. In the case of the original question, drop = TRUE
with the extract operator returns a factor, which is the right type of input for nlevels()
.
> nlevels(aTibble[,"f1",drop=TRUE])
[1] 3
The [[
and $
forms of the extract operator extract a single object, so they return objects of type factor
, the required input to nlevels()
.
> str(aTibble$f1)
Factor w/ 3 levels "a","b","c": 1 2 3 1 2 3 1 2 3 1 ...
> nlevels(aTibble$f1)
[1] 3
>
> # produces expected answer
> str(aTibble[["f1"]])
Factor w/ 3 levels "a","b","c": 1 2 3 1 2 3 1 2 3 1 ...
> nlevels(aTibble[["f1"]])
[1] 3
>
The fourth form of the extract operator, @
(known as the slot operator), is used with formally defined objects built with the S4 object system, and is not relevant for this question.
Conclusion: Base R is still relevant when using the Tidyverse
Per tidyverse.org, the tidyverse is a collection of R packages that share an underlying philosophy, grammar, and data structures. When one becomes familiar with the tidyverse family of packages, it's possible to do many things in R without understanding the fundamentals of how Base R works.
That said, when one incorporates Base R functions or functions from packages outside the tidyverse into tidyverse-style code, it's important to know key Base R concepts.
iris[,5]
is a vector butas_tibble(iris)[,5]
still inherits from adata.frame
. This is whynlevels
is failing. Alternatives include:nlevels(df$f1)
,nlevels(df[,"f1",drop=TRUE])
, andnlevels(df[["f1"]])
. – r2evans