Logistic regression detection probability

Question

I'm attempting to access the key covariates in detection probability.

I'm currently using this code

    model1 <- glm(P ~ Width +
                MBL +
                DFT +
                SGP +
                SGC +
                Depth,
              family = binomial("logit"), 
              data = dframe2, na.action = na.exclude)
summary.lm(model1)

my data is structured like this-

Site Transect Q  ID   P  Width DFT  Depth    Substrate SGP SGC  MBL
1      Vr1    Q1  1   0    NA  NA   0.5         Sand   0   0    0.00000
2      Vr1    Q2  2   0    NA  NA   1.4 Sand&Searass   1   30   19.14286
3      Vr1    Q3  3   0    NA  NA   1.7 Sand&Searass   1   15   16.00000
4      Vr1    Q4  4   1    17   0   2.0 Sand&Searass   1   95   35.00000
5      Vr1    Q5  5   0    NA  NA   2.4         Sand   0   0    0.00000
6      Vr1    Q6  6   0    NA  NA   2.9 Sand&Searass   1   50   24.85714

My sample size is really small (n=12) and I only have ~70 rows of data.

when I run the code it returns

                      Estimate   Std. Error  t value Pr(>|t|)   
(Intercept)            2.457e+01  4.519e+00   5.437  0.00555 **
Width                  1.810e-08  1.641e-01   0.000  1.00000   
MBL                   -2.827e-08  9.906e-02   0.000  1.00000   
DFT                    2.905e-07  1.268e+00   0.000  1.00000   
SGP                    1.064e-06  2.691e+00   0.000  1.00000   
SGC                   -2.703e-09  3.289e-02   0.000  1.00000   
Depth                  1.480e-07  9.619e-01   0.000  1.00000   
SubstrateSand&Searass -8.516e-08  1.626e+00   0.000  1.00000

Does this mean my data set is just to small to asses detection probability or am I doing something wrong?

T-value is 0.000 which is Estimate/Std. error. For other features except Intercept the Estimate is closer to zero. From your data, I can see that there are a lot of NA try to replace it with something meaningful like "Mean of that feature". Since the n~70 use sampling techniques like Bootstrap sampling — Justice_Lords
Yeah, this is almost certainly about your sample size. You say you have 70 rows, and it looks like lots of them have NA somewhere, and you've got na.action = na.exclude, so your actual sample size is probably tiny. With so little info to use, your model is, unsurprisingly, failing to find patterns and returning a bunch of coefficients of approximately zero. — ulfelder

razimbres razimbres · Accepted Answer · 2019-03-09T12:10:29

According to Hair (author of book Multivariate Data Analysis), you need at least 15 examples for each feature (column) of your data. If you have 12, you could only select one feature.

So, run a t-test comparing means of features related the each one of the two classes (0 and 1 at target - dependent variable) and choose the feature (independent variable) whose mean difference between classes is the biggest. This means that variable can properly create a boundary to split these two classes.

Logistic regression detection probability

1 Answers