Your choices
order
from base
arrange
from dplyr
setorder
and setorderv
from data.table
arrange
from plyr
sort
from taRifx
orderBy
from doBy
sortData
from Deducer
Most of the time you should use the dplyr
or data.table
solutions, unless having no-dependencies is important, in which case use base::order
.
I recently added sort.data.frame to a CRAN package, making it class compatible as discussed here:
Best way to create generic/method consistency for sort.data.frame?
Therefore, given the data.frame dd, you can sort as follows:
dd <- data.frame(b = factor(c("Hi", "Med", "Hi", "Low"),
levels = c("Low", "Med", "Hi"), ordered = TRUE),
x = c("A", "D", "A", "C"), y = c(8, 3, 9, 9),
z = c(1, 1, 1, 2))
library(taRifx)
sort(dd, f= ~ -z + b )
If you are one of the original authors of this function, please contact me. Discussion as to public domaininess is here: https://chat.stackoverflow.com/transcript/message/1094290#1094290
You can also use the arrange()
function from plyr
as Hadley pointed out in the above thread:
library(plyr)
arrange(dd,desc(z),b)
Benchmarks: Note that I loaded each package in a new R session since there were a lot of conflicts. In particular loading the doBy package causes sort
to return "The following object(s) are masked from 'x (position 17)': b, x, y, z", and loading the Deducer package overwrites sort.data.frame
from Kevin Wright or the taRifx package.
#Load each time
dd <- data.frame(b = factor(c("Hi", "Med", "Hi", "Low"),
levels = c("Low", "Med", "Hi"), ordered = TRUE),
x = c("A", "D", "A", "C"), y = c(8, 3, 9, 9),
z = c(1, 1, 1, 2))
library(microbenchmark)
# Reload R between benchmarks
microbenchmark(dd[with(dd, order(-z, b)), ] ,
dd[order(-dd$z, dd$b),],
times=1000
)
Median times:
dd[with(dd, order(-z, b)), ]
778
dd[order(-dd$z, dd$b),]
788
library(taRifx)
microbenchmark(sort(dd, f= ~-z+b ),times=1000)
Median time: 1,567
library(plyr)
microbenchmark(arrange(dd,desc(z),b),times=1000)
Median time: 862
library(doBy)
microbenchmark(orderBy(~-z+b, data=dd),times=1000)
Median time: 1,694
Note that doBy takes a good bit of time to load the package.
library(Deducer)
microbenchmark(sortData(dd,c("z","b"),increasing= c(FALSE,TRUE)),times=1000)
Couldn't make Deducer load. Needs JGR console.
esort <- function(x, sortvar, ...) {
attach(x)
x <- x[with(x,order(sortvar,...)),]
return(x)
detach(x)
}
microbenchmark(esort(dd, -z, b),times=1000)
Doesn't appear to be compatible with microbenchmark due to the attach/detach.
m <- microbenchmark(
arrange(dd,desc(z),b),
sort(dd, f= ~-z+b ),
dd[with(dd, order(-z, b)), ] ,
dd[order(-dd$z, dd$b),],
times=1000
)
uq <- function(x) { fivenum(x)[4]}
lq <- function(x) { fivenum(x)[2]}
y_min <- 0 # min(by(m$time,m$expr,lq))
y_max <- max(by(m$time,m$expr,uq)) * 1.05
p <- ggplot(m,aes(x=expr,y=time)) + coord_cartesian(ylim = c( y_min , y_max ))
p + stat_summary(fun.y=median,fun.ymin = lq, fun.ymax = uq, aes(fill=expr))
(lines extend from lower quartile to upper quartile, dot is the median)
Given these results and weighing simplicity vs. speed, I'd have to give the nod to arrange
in the plyr
package. It has a simple syntax and yet is almost as speedy as the base R commands with their convoluted machinations. Typically brilliant Hadley Wickham work. My only gripe with it is that it breaks the standard R nomenclature where sorting objects get called by sort(object)
, but I understand why Hadley did it that way due to issues discussed in the question linked above.