When I need to filter a data.frame, i.e., extract rows that meet certain conditions, I prefer to use the subset
function:
subset(airquality, Month == 8 & Temp > 90)
Rather than the [
function:
airquality[airquality$Month == 8 & airquality$Temp > 90, ]
There are two main reasons for my preference:
I find the code reads better, from left to right. Even people who know nothing about R could tell what the
subset
statement above is doing.Because columns can be referred to as variables in the
select
expression, I can save a few keystrokes. In my example above, I only had to typeairquality
once withsubset
, but three times with[
.
So I was living happy, using subset
everywhere because it is shorter and reads better, even advocating its beauty to my fellow R coders. But yesterday my world broke apart. While reading the subset
documentation, I notice this section:
Warning
This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.
Could someone help clarify what the authors mean?
First, what do they mean by "for use interactively"? I know what an interactive session is, as opposed to a script run in BATCH mode but I don't see what difference it should make.
Then, could you please explain "the non-standard evaluation of argument subset" and why it is dangerous, maybe provide an example?
with(airquality, airquality[Month == 8 & Temp > 90, ])
– Tyler Rinkerdplyr::filter
has the same problem. I.e. if the environment happens to have a variable with that name, it will use it instead of the variable in the data frame. Makes for confusing debugging! – CoderGuy123