Gist
The error: Error in predmat[which, seq(nlami)] = preds : replacement has length zero
The context: data is simulated with a binary y, but there are n coders of true y. the data is stacked n times and a model is fitted, trying to get true y.
The error is received for
L2penalty, but notL1penalty.- when Y is the coder Y, but not when it is the true Y.
- the error is not deterministic, but depends on seed.
UPDATE: the error is for versions after 1.9-8. 1.9-8 does not fail.
Reproduction
base data:
library(glmnet)
rm(list=ls())
set.seed(123)
num_obs=4000
n_coders=2
precision=.8
X <- matrix(rnorm(num_obs*20, sd=1), nrow=num_obs)
prob1 <- plogis(X %*% c(2, -2, 1, -1, rep(0, 16))) # yes many zeros, ignore
y_true <- rbinom(num_obs, 1, prob1)
dat <- data.frame(y_true = y_true, X = X)
create coders
classify <- function(true_y,precision){
n=length(true_y)
y_coder <- numeric(n)
y_coder[which(true_y==1)] <- rbinom(n=length(which(true_y==1)),
size=1,prob=precision)
y_coder[which(true_y==0)] <- rbinom(n=length(which(true_y==0)),
size=1,prob=(1-precision))
return(y_coder)
}
y_codings <- sapply(rep(precision,n_coders),classify,true_y = dat$y_true)
stack it all
expanded_data <- do.call(rbind,rep(list(dat),n_coders))
expanded_data$y_codings <- matrix(y_codings, ncol = 1)
reproduce error
Since the error depends on seed, a loop is necessary. only the first loop will fail, the other two will finish.
X <- as.matrix(expanded_data[,grep("X",names(expanded_data))])
for (i in 1:1000) cv.glmnet(x = X,y = expanded_data$y_codings,
family="binomial", alpha=0) # will fail
for (i in 1:1000) cv.glmnet(x = X,y = expanded_data$y_codings,
family="binomial", alpha=1) # will not fail
for (i in 1:1000) cv.glmnet(x = X,y = expanded_data$y_true,
family="binomial", alpha=0) # will not fail
Any thoughts where this is coming from in glmnet and how to avoid it? from my reading of cv.glmnet, this is after the cv routine and is inside cvstuff = do.call(fun, list(outlist, lambda, x, y, weights, offset, foldid, type.measure, grouped, keep)), which I do not understand its role, hence the failure, and how to avoid it.
sessions (Ubuntu and PC)
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] glmnet_2.0-2 foreach_1.4.3 Matrix_1.2-7.1 devtools_1.12.0
loaded via a namespace (and not attached):
[1] httr_1.2.1 R6_2.2.0 tools_3.3.1 withr_1.0.2 curl_2.1
[6] memoise_1.0.0 codetools_0.2-15 grid_3.3.1 iterators_1.0.8 knitr_1.14
[11] digest_0.6.10 lattice_0.20-34
and
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] glmnet_2.0-2 foreach_1.4.3 Matrix_1.2-7.1 devtools_1.12.0
loaded via a namespace (and not attached):
[1] httr_1.2.1 R6_2.2.0 tools_3.3.1 withr_1.0.2 curl_2.1
[6] memoise_1.0.0 codetools_0.2-15 grid_3.3.1 iterators_1.0.8 digest_0.6.10
[11] lattice_0.20-34
glmnet_2.0-5in a similar situation using ridge logistic regression. As the comment mentions in github.com/lmweber/glmnet-error-example/blob/master/…, after stepping through the code it is to do withmlamibeing larger than alllambdavalues. Has this bug been made clear to theglmnetdevelopers? - rwolst