2
votes

Having packages loaded as below:

R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.9.3 dplyr_0.3       

loaded via a namespace (and not attached):
[1] assertthat_0.1 DBI_0.3.1      magrittr_1.0.1 parallel_3.1.1 plyr_1.8.1     Rcpp_0.11.2   
[7] reshape2_1.4   stringr_0.6.2  tools_3.1.1 

Trying to experiment the interesting "data_frame" function in the new dplyr_0.3. Seems data_frame could not do recycling when building a data frame. Is this intentional?

data_frame(x=letters[1:10], y=1:5, z=runif(10))  ## pay attention to "y" column
Error in data_frame_(lazyeval::lazy_dots(...)) : 
  arguments imply differing number of rows: 10, 5, 10

whereas, the base data.frame function could achieve that...

data.frame(x=letters[1:10], y=1:5, z =runif(10))
   x y          z
1  a 1 0.54345855
2  b 2 0.98478537
3  c 3 0.51510861
4  d 4 0.03766893
5  e 5 0.32097472
6  f 1 0.77391366
7  g 2 0.61993720
8  h 3 0.87983035
9  i 4 0.63159025
10 j 5 0.53198094

though data.frame will give an error if the multiple of number of rows of the intended data frame over the number of elements in "y" column is not an integer:

data.frame(x = letters[1:10], y = 1:4, z = runif(10))  ## Note the change on "y"
Error in data.frame(x = letters[1:10], y = 1:4, z = runif(10)) : 
  arguments imply differing number of rows: 10, 4

Seems only data.table could attempt to complete the task with warning message given:

data.table(x = letters[1:10], y = 1:4, z = runif(10))
    x y          z
 1: a 1 0.17149580
 2: b 2 0.56452774
 3: c 3 0.01237395
 4: d 4 0.47183540
 5: e 1 0.52561037
 6: f 2 0.27053798
 7: g 3 0.82603959
 8: h 4 0.73871563
 9: i 1 0.03931619
10: j 2 0.34125535
Warning message:
In data.table(x = letters[1:10], y = 1:4, z = runif(10)) :
  Item 2 is of size 4 but maximum size is 10 (recycled leaving remainder of 2 items)

Why such different behaviours working with "data_frame" and "data.frame"? I am working with dataframes and datatables simultaneously with dplyr and data.table packages. Understanding the behaviour would help avoid pricey errors. Thank you.

1

1 Answers

2
votes

Q: "Is this intentional?". Well, from the help text of data_frame: "Only recycles length 1 inputs.". So yes, I assume it is intentional.

Q: "Why?": Probably because data_frame is a "trimmed down version of data.frame" (from ?data_frame).