I am using scikit-learn PCA to find the principal components of a dataset with about 20000 features and 400+ samples.
However, comparing with Orange3 PCA which should be using scikit-learn PCA, I get different results. I also unchecked the normalization option proposed by Orange3 PCA.
With scikit-learn the first Principal Component accounts for ~14% of total variance, the second for ~13% and so on.
With Orange3 I get a very different result (~65% of variance for the first Principal Component and so on):
My code using scikit-learn is the following:
import pandas as pd
from sklearn.decomposition import PCA
matrix = pd.read_table("matrix.csv", sep='\t', index_col=0)
sk_pca = PCA(n_components=None)
result = sk_pca.fit(matrix.T.values)
print(result.explained_variance_ratio_)
With Orange3, I loaded the csv using the file block. Then I connected this block to the PCA block, in which I unchecked the normalization option.
Where is the difference between the two methods?