when is plyr better than data.table? [closed]

Question

Better here can mean faster or easier to read/shorter syntax or it could also mean that the command is not even doable in data.table.

I don't use plyr a lot and would like to know if there are cases when I should. Because I don't use it a lot, the only example I can come up with is rbind.fill that to my knowledge doesn't have a data.table analog and every other example I've seen of smth being done in both plyr and data.table, the latter was faster and easier to read/more compact.

plyr will not (in general) be faster than data.table. Some people (like myself) find the former's syntax far more intuitive and readable than the latter. But that is merely a subjective choice. — joran
@Arun thx, I'll take a look at those functions. Does plyr do anything for data.frame's better? — eddi
@Arun, cool thanks. The parallel stuff sounds interesting, I'll take a look at it. — eddi
Just my 2ct, for multidimensional array's plain array is much faster that aaply. — Paul Hiemstra

Brian Diggs Brian Diggs · Accepted Answer · 2013-04-22T18:30:02

They are different packages with different purposes. One is not a substitute for the other, despite there being a small subset of functionality for which they overlap.

Here is the brief summary of each package, from the packages themselves:

The plyr package is a set of clean and consistent tools that implement the split-apply-combine pattern in R. This is an extremely common pattern in data analysis: you solve a complex problem by breaking it down into small pieces, doing something to each piece and then combining the results back together again.

and

data.table ... offers fast subset, fast grouping, fast update, fast ordered joins and list columns in a short and flexible syntax, for faster development. It is inspired by A[B] syntax in R where A is a matrix and B is a 2-column matrix.

Where they overlap is in the "fast grouping" which plyr also does by splitting data.frames, operating on pieces, and recombining them into a single data.frame. data.table has many other features which make operations on data.frame like structures fast; plyr has features which apply the split-apply-combine paradigm to other data structures such as lists and arrays (both as inputs and outputs).

So, really, they are two different tools that happen to have a small area of overlap which address the same problem domain, but each does much more than that and if you want/need that additional functionality, then that package should be used.

when is plyr better than data.table? [closed]

1 Answers