When working with data frames, it is common to need a subset. However use of the subset function is discouraged. The trouble with the following code is that the data frame name is repeated twice. If you copy&paste and munge code, it is easy to accidentally not change the second mention of adf which can be a disaster.
adf=data.frame(a=1:10,b=11:20)
print(adf[which(adf$a>5),]) ##alas, adf mentioned twice
print(with(adf,adf[{a>5},])) ##alas, adf mentioned twice
print(subset(adf,a>5)) ##alas, not supposed to use subset
Is there a way to write the above without mentioning adf twice? Unfortunately with with() or within(), I cannot seem to access adf as a whole?
The subset(...) function could make it easy, but they warn to not use it:
This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.
filter
fromdplyr
. i.e.filter(adf, a >5)
is similar tosubset
. If you are usingdata.table
.setDT(adf)[a>5]
- akrundata.frames
long ago. Once you''ll convert your data set to adata.table
, all your syntax will become much shorter. Though I just want to mention that you are using way too much code here. You neither don't needprint
orwhich
, justadf[adf$a>5,]
will do which in turn doesn't look too confusing to me. - David Arenburgsubset()
is not encouraged, please have a look this SO question. - MERose