6
votes

I get this error when trying to fit glmnet() with family="binomial", for Logistic Regression fit:

> data <- read.csv("DAFMM_HE16_matrix.csv", header=F)
> x <- as.data.frame(data[,1:3])
> x <- model.matrix(~.,data=x)
> y <- data[,4]

> train=sample(1:dim(x)[1],287,replace=FALSE)

> xTrain=x[train,]
> xTest=x[-train,]
> yTrain=y[train]
> yTest=y[-train]

> fit = glmnet(xTrain,yTrain,family="binomial")

Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs,  : 
one multinomial or binomial class has 1 or 0 observations; not allowed

Any help would be greatly appreciated - I've searched the internet and haven't been able to find anything that helps

EDIT:

Here's what data looks like:

> data
          V1       V2    V3      V4
1   34927.00   156.60 20321  -12.60
2   34800.00   156.60 19811  -18.68
3   29255.00   156.60 19068    7.50
4   25787.00   156.60 19608    6.16
5   27809.00   156.60 24863   -0.87
...
356 26495.00 12973.43 11802    6.35
357 26595.00 12973.43 11802   14.28
358 26574.00 12973.43 11802    3.98
359 25343.00 14116.18 11802   -2.05
2
Are you sure your yTrain contains at least 2 distinct values? - Hong Ooi
@HongOoi Absolutely. There are 287 distinct values and I checked to make sure it wasn't a matrix and is a vector. - groutgauss
@HongOoi I also tried just running glmnet(x,y,family="binomial") which yielded the same error. - groutgauss
Well, hang on; your V4 variable appears to be continuous, not binary. You can't fit a logistic model with that. - Hong Ooi
This error also can occur legitimately (when the target variable is a factor), e.g. in cv.glmnet, for some choices of random seed, esp. with severe class imbalances, when one of the CV folds does in fact end up with only have 0 or 1 observation. Since that occurs randomly, you have to gracefully handle it. - smci

2 Answers

3
votes

I think it is because of the levels of your factor variable. Suppose there are 10 levels and your 1 level has only one record, try to remove this level. You can use drop levels from gdata package.

1
votes

This is generally because of data structure and their response variable, sometimes the response has more than binary output. or the data response variable has binary out come, but they have much more one class from the other and we may called them most probably class imbalance problem. Therefore the problem then occur during training and testing the data. So, you must convert the response variable into binary if there are more than two outcomes, 2nd you may apply multinomial as respect to binomial. Hope this can help you.