0
votes

I am fitting a model with many random effects using the bam() function within the mgcv package for R. My basic model structure looks like:

fit <- bam(y ~ s(x1) + s(x2) + s(xn) + s(plot, bs = 're'), data = dat)

This function works for 4 subsets of my data, but not the fifth, which is surprising. Instead, it throws this error:

Error in qr.qty(qrx, f) : 
  right-hand side should have 14195 not 14196 rows

This error goes away if I switch to using the gam() rather than bam() function. It also goes away if I drop the random effect from the model. I am really unsure whats causing this error, or what to do about it. Unfortunately, generating a reproducible example would require passing along a very large dataset, as its not clear why this error is thrown on this particular dataset, compared to 4 other datasets fitting the exact same model.

Any idea why this error is being thrown, and how to overcome it, would be greatly appreciated.

2
Suggest adding the data argument, and not using variables floating around in your workspace, especially given that you're making subsets.Edward
@Edward apologies, I was just posting the general structure of the model. In reality I use very specific data arguments, and make sure to mind my environmental variables.colin
This is an error thrown from deep within the linear algebra code underlying some step in the model: github.com/wch/r-source/blob/…. This suggests something weird is happening, likely something odd with that particular data; rank deficient or something perhaps?Gavin Simpson
@GavinSimpson i.e. duplicated rows? Rows with identical predictor values? Not sure why this would throw an error for bam() and not gam().colin
@GavinSimpson this may be the case. I am modeling tree mortality (binary, 0-1 binomial outcome), and many trees are observed within the same plot. Across thousands of plots, occasionally we do have identical observations (trees w/ the same diameter and inside the same plot, therefor identical site factors that both lived or died). However, this isn't unique to this 5th data subset...colin

2 Answers

2
votes

I had the same question and I found this r-help mail which tries to solve the same problem:

[R] bam (mgcv) not using the specified number of cores

After reading the mail, I deleted all the code about the cluster, such as the argument cluster in bam() function. Then the error message goes away.

I don't know the details but I hope this trick will help you.

1
votes

One possible cause of

Error in qr.qty(qrx, f) : right-hand side should have 14195 not 14196 rows

is running out of RAM. This may explain why you have seen the error for some datasets but not others. This is especially common when using a large cluster size.