I am running an exploratory factor analysis on a set of questions from a survey with the factor_analyser package in python. The result shows 8 factors with a clear set of variables with highest loadings in each of the factors.
In order to name the factors correctly and validate them, I wanted to analyse the correlation between the answered questions (with high loadings to a factor) and the factor scoring over all respondents.
However, when I analyse these results, the factors seem to switch. E.g., the first factor containing high loading variables on 'achievement'-questions, appears in the scoring results as the second factor having high correlation with the 'achievement'-questions for the respondents. Moreover, the high loading variables on the first factor show the lowest correlation with this factor scores when analysed with the factor scoring. See below the code:
fa = FactorAnalyzer(rotation = 'oblimin',
n_factors = 8)
fa.fit(test_data)
data_loadings = pd.DataFrame(fa.loadings_(test_data), index = test_data.columns)
data_transformed = pd.DataFrame(fa.transform(test_data), index = test_data.index)
Here's the visual outcome of the factor loadings, and here the visual outcome of the correlation matrix. Where you can see the (sorted) variables with highest loading to factor [0] differ from the variables with highest correlation to factor [0].
Does anyone know how this is possible? Does it have to do with the rotation, or the naming with the indices?