How to efficiently specify a large predictor matrix for stan data block

Question

I would appreciate any help to create a large predictor matrix for stan data block.

I want to use variables w_1 to w_K from the data below as predictor "matrix" real<lower=0> weights[N, W]; in my model. K=W is the number of variables weights (columns of weights), N is the number of observation (rows of weights), so K and N are int.

my current approach below works for a few columns (e.g., K=10) but I have more, K>100 columns, therefore, given the data below, I need a function that provides an efficient and scalable way to do this:

#for the desired data block 
    dat1 <- list (N = N, 
    ncases = ncases, A = A, B = B, id = id, P = imput, 
    nn = nn, W = 10, 
    weights = cbind(w_1, w_2, w_3, w_4, w_5, w_6, w_7, w_8, w_9, w_10))

I explored compose_data from tidybayes but I fail to see how I could use that to accomplish what I want for desired data block. Therefore, Any help would be much appreciated.

#sample data

dat <- data.frame(
id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4),
imput = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5),
A = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
B = c(1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0),
Pass = c(278, 278, 278, 278, 278, 100, 100, 100, 100, 100, 153, 153, 153, 153, 153, 79, 79, 79, 79, 79), 
Fail = c(740, 743, 742, 743, 740, 7581, 7581, 7581, 7581, 7581, 1231, 1232, 1235, 1235, 1232, 1731, 1732, 1731, 1731, 1731), 
W_1= c(4, 3, 4, 3, 3, 1, 2, 1, 2, 1, 12, 12, 11, 12, 12, 3, 5, 3, 3, 3),
W_2= c(3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 12, 12, 12, 12, 12, 3, 3, 3, 3, 3),
W_3= c(4, 3, 3, 3, 3, 1, 2, 1, 1, 1, 12, 12, 11, 12, 12, 3, 3, 3, 3, 3),
W_4= c(3, 3, 4, 3, 3, 1, 1, 1, 2, 1, 12, 12, 13, 12, 12, 3, 2, 3, 3, 3),
W_5= c(3, 3, 3, 3, 3, 1, 0, 1, 1, 1, 12, 12, 12, 12, 12, 3, 3, 3, 3, 3),
W_6= c(4, 3, 3, 3, 3, 1, 1, 1, 1, 1, 12, 12, 12, 12, 12, 3, 3, 3, 3, 3),
W_7= c(3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 12, 12, 12, 12, 12, 3, 3, 3, 3, 3),
W_8= c(3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 15, 12, 12, 12, 12, 3, 3, 3, 3, 3),
W_9= c(3, 3, 3, 4, 3, 1, 1, 1, 1, 1, 12, 12, 12, 12, 12, 2, 3, 3, 3, 3),
W_10= c(3, 3, 4, 3, 3, 1, 1, 1, 1, 1, 12, 10, 12, 12, 12, 3, 3, 3, 3, 3)
      )

#my current approach

N <- nrow(dat)
ncases <- dat$Pass
nn <- dat$Fail + dat$Pass
A <- dat$A
B <- dat$B
id <- dat$id
imput <- dat$imput
w_1 <- dat$W_1
w_2 <- dat$W_2
w_3 <- dat$W_3
w_4 <- dat$W_4
w_5 <- dat$W_5
w_6 <- dat$W_6
w_7 <- dat$W_7
w_8 <- dat$W_8
w_9 <- dat$W_9
w_10 <- dat$W_10

#for current data block
    dat_list <-dat %>%compose_data(.n_name = n_prefix("N"))

#for desired data block
    dat1 <- list (N = N, 
              ncases = ncases, A = A, B = B, id = id, P = imput, nn = nn, W = 10,
              weights = cbind(w_1, w_2, w_3, w_4, w_5, w_6, w_7, w_8, w_9, w_10))

#current data block

    data{
    int N;                                    // number of observations
    int ncases[N];                        
    int A[N];                                 
    int B[N];                                
    int nn[N];                               
    int id[N];                                
    real<lower=0> w_1[N];                     // variable w_1
    real<lower=0> w_2[N];                     // variable w_2       
    real<lower=0> w_3[N];                     // variable w_3      
    real<lower=0> w_4[N];                     // variable w_4       
    real<lower=0> w_5[N];                     // variable w_5       
    real<lower=0> w_6[N];                     // variable w_6       
    real<lower=0> w_7[N];                     // variable w_7       
    real<lower=0> w_8[N];                     // variable w_8       
    real<lower=0> w_9[N];                     // variable w_9       
    real<lower=0> w_10[N];                    // variable w_10
    }

#desired data block

data{
int N;                                        // number of observations
int ncases[N];                        
int A[N];                                  
int B[N];                                
int nn[N];                              
int id[N];                                
real<lower=0> weights[N, W];                  // N by W block of weights 
}

This question has also been posted here. Thanks in advance for any help.

What's wrong with your desired data block? You can just use matrix<lower = 0> w[W, N]; and then w10 is just w[10]. Or it can be an array of vectors. It depends on how you need to access it. — Bob Carpenter
Thanks for this, @BobCarpenter. My desired data block works with my small, sample data but my actual dataset has over 100 weights (w_1:w_W) where W>100. So, I am looking for an efficient, scalable way to do this #for desired data block dat1 <- list (N = N, ncases = ncases, A = A, B = B, id = id, P = imput, nn = nn, W = 10, weights = cbind(w_1, w_2, w_3, w_4, w_5, w_6, w_7, w_8, w_9, w_10)) — Krantz
That's an R question, not a Stan question. The Stan matrix data structure works fine for any size W up to the memory constraints of your computer. — Bob Carpenter
Thanks, @BobCarpenter. It is good news for me that Stan matrix data structure works fine for any size W up to the memory constraints of your computer. Also, edited the question to reflect your view. — Krantz

A. S. K. A. S. K. · Accepted Answer · 2019-02-09T16:39:32

If all the predictor columns in dat start with W_, then I think this should do the trick:

w.matrix = as.matrix(dat[,grepl("^W_", colnames(dat))])
dat1 <- list (N = N, ncases = ncases, A = A, B = B, id = id, P = imput, nn = nn,
    W = ncol(w.matrix), weights = w.matrix)

How to efficiently specify a large predictor matrix for stan data block

1 Answers