0
votes

Please Help!

Hi I am very new to Python and trying to Find correlation of a large dataframe.

df1 = df0.corr()

uns_df = df1.unstack()

pd.DataFrame(uns_df[uns_df < 1].sort_values(ascending=True), columns =[coef])

The code above gives me the list of variable combinations from coefficient that are highest. By changing ascending=False, it gives me the opposite rank.

In addition, I also created heatmap for this.

However, I have more than 200 variables in a dataframe, it is very hard for me to interpret the outcomes of correlation matrix, list, and the heatmap.

What I want to do here is that,

First,

re-order the variables of dataframe that are highly correlated in the heatmap, so that the upper left portion of the heatmap will have darker color than the lower right portion.

Second,

I want to pick the combination of variables that have perhaps over/under 0.7/-0.7 correlation coefficient and make the heatmap again. So for instance, I currently have more than 200 variables, but the new heatmap may only contain 50 variables.

Inaddition, I also want the code to ignore the NaN valuse. I do not want to change NaN to 0 and let the code ignore them when calculating the correlations

Thank you