1
votes

I have a very large dataset, so I want to avoid loops.

I have three columns of data:

col1 = time presented as 10000, 10001, 10002, 10100, 10101, 10102, 10200, 10201, 10202, 10300, ... (total 18000 times)

col2 = id number 1 2 3 4 ... (total 500 ids)

col3 = reading associated with particular id at particular time. 0.1 0.5 0.6 0.7... Say this is called Data3

10000 1 0.1

10001 1 0.5

10002 1 0.6

10100 1 0.7

10200 1 0.6 (NOTE - some random entries missing)

I want to present this as a matrix (called DataMatrix), but there is missing data, so a simple reshape will not do. I want to have the missing data as NA entries.

DataMatrix is currently an NA matrix of 500 columns and 18000 rows, where the row names and column names are the times and ids respectively.

1 2 3 4 ....

10000 NA NA NA NA ....

10001 NA NA NA NA ....

Is there a way I can get R to go through each row of Data3, completing DataMatrix with the reading Data3[,3] by placing it in the row and column of the matrix whose names relate to the Data3[,1] and Data3[,2]. But without loops.

Thanks to all you smart people out there.

2
Why do you say "there is missing data, so a simple reshape will not do"? Did you try it? What did not work about the solution? If you provide a (small) reproducible example that demonstrates the conditions/problems you expect to run into, you're more likely to get meaningful help. - A5C1D2H2I1M1N2O1R2T1
What about library(reshape2); DataMatrix <- dcast(Data3, col1~col2, value.var="col3")? - lukeA
Thanks Ananda - you're right - playing around is definitely the way forward. I thought there might be some matching or look up function which I didn't know about. The missing data means that Data3 is NOT 18000*500 long, but less. Hence, reshaping disaster. - user3173922

2 Answers

1
votes

Here is a solution with possible id values in 1:10 and times values in 1:20. First, create data:

mx <- matrix(c(sample(1:20, 5), sample(1:10, 5), sample(1:50, 5)), ncol=3, dimnames=list(NULL, c("time", "id", "reading")))
times <- 1:20
ids <- 1:10
mx
#      time id reading
# [1,]    4  3      25
# [2,]    5  4       9
# [3,]    9  7      45
# [4,]   18  1      40
# [5,]   11  8      28

Now, use outer to pass every possible combination of time/id to a look up function that returns the corresponding reading value:

outer(times, ids, 
  function(x, y) {
    mapply(function(x.sub, y.sub) {
      val <- mx[mx[, 1] == x.sub & mx[, 2] == y.sub, 3]
      if(length(val) == 0L) NA_integer_ else val
    },
    x, y)
} )

This produces the (hopefully) desired answer:

#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#  [1,] NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   
#  [2,] NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   
#  [3,] NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   
#  [4,] NA   NA   25   NA   NA   NA   NA   NA   NA   NA   
#  [5,] NA   NA   NA   9    NA   NA   NA   NA   NA   NA   
#  [6,] NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   
#  [7,] NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   
#  [8,] NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   
#  [9,] NA   NA   NA   NA   NA   NA   45   NA   NA   NA   
# [10,] NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   
# [11,] NA   NA   NA   NA   NA   NA   NA   28   NA   NA   
# [12,] NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   
# [13,] NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   
# [14,] NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   
# [15,] NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   
# [16,] NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   
# [17,] NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   
# [18,] 40   NA   NA   NA   NA   NA   NA   NA   NA   NA   
# [19,] NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   
# [20,] NA   NA   NA   NA   NA   NA   NA   NA   NA   NA  
0
votes

If I understood you correctly:

Data3 <- data.frame(col1=10000:10499,  
                    col2=1:500,
                    col3=round(runif(500),1))

library(reshape2) 
DataMatrix <- dcast(Data3, col1~col2, value.var="col3")
DataMatrix[1:5, 1:5]
#    col1   1   2   3   4
# 1 10000 0.4  NA  NA  NA
# 2 10001  NA 0.6  NA  NA
# 3 10002  NA  NA 0.9  NA
# 4 10003  NA  NA  NA 0.5
# 5 10004  NA  NA  NA  NA