2
votes

I have created a questionnaire. This questionnaire is composed of four sub-scales measuring 4 different components of my variable of interest. Each subscale is composed of 3 items. Each item is a 6-point scale (then responses for each item are comprised between 1 and 6).

Here is a sample of my data, each row is a subject :

> dput(DF[1:10, 7:18 ]) 
structure(list(I1 = c(3, 6, 6, 4, 5, 5, 3, 3, 5, 4), I2 = c(3, 
5, 5, 6, 4, 5, 2, 5, 5, 4), I3 = c(1, 4, 2, 3, 3, 4, 4, 1, 5, 
2), I4 = c(5, 6, 6, 6, 5, 6, 6, 6, 6, 6), I5 = c(5, 6, 5, 5, 
6, 6, 5, 6, 5, 5), I6 = c(4, 6, 6, 6, 5, 5, 6, 4, 5, 4), I7 = c(3, 
6, 5, 6, 4, 4, 3, 5, 3, 4), I8 = c(4, 6, 5, 5, 4, 4, 3, 5, 3, 
5), I9 = c(4, 6, 4, 4, 5, 5, 5, 4, 4, 3), I10 = c(2, 4, 5, 6, 
3, 2, 4, 1, 2, 4), I11 = c(3, 3, 4, 6, 4, 6, 5, 5, 2, 3), I12 = c(3, 
6, 6, 6, 5, 4, 4, 4, 5, 5)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))

217 participants fulfilled this questionnaire (no missing value) and I want to test if my data support my model with a CFA.

Here is my code :

library(lavaan)

model <- "
Factor1 =~ I1 + I2 + I3
Factor2 =~ I4 + I5 + I6
Factor3 =~ I7 + I8 + I9
Factor4 =~ I10 + I11 + I12
"

fit <- cfa(model, data = DF)
summary(fit, fit.measures = TRUE, standardized = TRUE)

But when I run it, I have the following error and I can't understand why. Here is the error message :

lavaan WARNING: the optimizer warns that a solution has NOT been found!
lavaan WARNING: the optimizer warns that a solution has NOT been found!
lavaan WARNING: Could not compute standard errors! The information matrix could not be inverted. This may be a symptom that the model is not identified.
lavaan WARNING: some estimated ov variances are negative
lavaan WARNING: covariance matrix of latent variables
is not positive definite; use lavInspect(fit, "cov.lv") to investigate.

Here what I have with lavInspect:

> lavInspect(fit, "cov.lv")
        Factr1   Factr2   Factr3   Factr4  
Factor1 7797.062                           
Factor2    0.248    0.451                  
Factor3    0.215    0.182    0.289         
Factor4   -0.254   -0.159    0.280 9883.238

Knowing that this huge cov for Factor 1 and Factor 4 could be explained by very high variances for I1 ( -7795.413) and I10 (-9881.204) displayed by lavaan, but if I ask directly R for var(DF$I1) and var(DF$I10), the result is very different.

Variances:
                   Estimate   Std.Err  z-value  P(>|z|)   Std.lv   Std.all 
   .I1             -7795.413       NA                   -7795.413 -4729.827
   .I2                 1.684       NA                       1.684     1.000
   .I3                 1.535       NA                       1.535     1.000
   .I4                 0.807       NA                       0.807     0.641
   .I6                 1.859       NA                       1.859     0.884
   .I7                 1.370       NA                       1.370     0.826
   .I8                 1.201       NA                       1.201     0.832
   .I9                 1.681       NA                       1.681     0.950
   .I10            -9881.204       NA                   -9881.204 -4859.350
   .I11                2.215       NA                       2.215     1.000
   .I12                0.784       NA                       0.784     1.000

> var(DF$I1)
[1] 1.683052
> var(DF$I10)
[1] 1.966163

Does any one know why it is not working? Is it because my model doesn't fit enough to my data?

Thank you in advance!

1
Did you use lavInspect(fit, "cov.lv")? What is the output? Can you show the data? - Tom
Thank you for your answer Tom, I will put those data on my post. - Lea_c
maybe the 4 factor structure indeed is not underlying the data. Did you inspect cor(DF) (actually you could have provided us with the covariance matrix rather than the the raw data); e.g., in the sample data I11 and I12 show a negative correlation. Also, I think it is an artefact of estimation order that I1 and I10 have large negative values; if you place another item on the first place for the respective factors, I suspect that those turn out to be largely negative. However, this question might more apropriately be adressed at stats.stackexchange - Tom

1 Answers

1
votes

Have a look at this lavaan discussion. Having factor variances in the thousands and others lower than 1 tends to be problematic for the estimation process.

I assume that some variables (esp. those of the factors 1 and 4) range from 1 to say 50 whereas others might range from 1 to 5. If this is the case, I suggest that you transform your variables to the same margin prior to the CFA estimation, e.g.,

vars <- c("I1", "I2", "I3", "I10", "I11", "I12")
DF[, vars] <- DF[, vars] / 10