0
votes

I have a data.frame in in that consist of two columns, a Sample_ID variable and a value variable. Each sample (of which there are 1971) has 132 individual points. The entire object is only ~3000000 bytes, or about 0.003 gigabytes (according to object.size()). For some reason, when I try to dcast the object into wide format, it throws an error saying it can't allocate vectors of size 3.3 GB, which is more 3 orders of magnitude larger than the original object.

The output I'm hoping for is 1 column for each sample, with 132 rows of data for each column.

The dcast code I am using is the following:

df_dcast = dcast(df, value.var = "Vals", Vals~Sample_ID)

I would provide the dataset for reproducibility but because this problem has to do with object size, I don't think a subset of it would help and I'm not sure how to easily post the full dataset. If you know how to post the full dataset or think that a subset would be helpful, let me know.

Thanks

1
Might be worth looking at sparse matrices.Richard Telford
@RichardTelford It seems from my first look that spare matrices are less efficient if your data has few zeroes, and my data has no zeroes at all, so I think those would be less efficient? But I feel like there is something else going on here because I can't think of a reason why a wide object with the exact same data should be so much larger than a long object. My assumption is that the dcast code I am using is doing something other than what I want it to do and I just can't tell because it's throwing and error before it completes. I'm going to try it on a subset now and see what it does.C. Denney

1 Answers

0
votes

Ok I figured out what was going wrong. It was attempting to use each unique value in the Vals column as an individual row producing far far more rows than the 132 that I wanted, so I needed to add a new column that was basically a value index going from 1:132 so the dataframe has 3 columns: ID, Vals, ValsNumber

The dcast code then looks like the following:

df_wide = dcast(df, value.var = "Vals", ValsNumber ~ Sample_ID)