0
votes

I have two DF with a structure like that:

df1 = pd.DataFrame(np.random.randn(8, 4), columns=['A', 'B', 'C', 'D'])
df2 = pd.DataFrame(np.random.randn(8, 6), columns=['T', 'U', 'V', 'X','Y','Z'])

I would like to test the correlation ('pearson') between every single column of DF1 with every single column of DF2. Then combine all the results into one correlation matrix.

A similar question has been asked in the past but my DF1 has several columns:

Correlation between two dataframes

Any help on how to do this will be great.

1

1 Answers

1
votes

Compute it directly:

# center and standardize
df1vals = (df1.values - df1.values.mean(axis=0)) / df1.values.std(axis=0)
df2vals = (df2.values - df2.values.mean(axis=0)) / df2.values.std(axis=0)

# compute correlation
pearsons = df1vals.T.dot(df2vals) / len(df1)

This has shape (len(df1), len(df2))

If you really need to use corrwith, then:

pd.concat([
    df1.corrwith(df2[c]) for c in df2
], axis=1, keys=df2.columns)