1
votes

i have two datasets which i want to find how much they are correlated.

the datasets represent the results of matches of two teams, where 1 represents a win, 0 represents a draw and -1 represents a loss.

e.g. for 5 games

team1 = [1,1,0,-1,0]
team2 = [0,1,0,1,0]

calculating the pearson correlation coefficient is fine till the point where one team won the last 5 games, hence a constant array, e.g.

team1 = [1,1,1,1,1]

In this case the pearson correlation coefficient will be undefined regardless of what team2 did.

I find this weird, because if the team2 also won most of the 5 games, the correlation should be close to 1 actually, not undefined.

and vice versa, if team2 lost most of their matches, the correlation should be close to -1 based on my understanding.

am I doing something wrong here? or my data needs another method to find how strong the relation between the datasets?

Thank in advance

1

1 Answers

-1
votes

so, i found this good resource: http://www.ashukumar27.io/similarity_functions/

i think i will go for Euclidean Distance which is more suitable for my use case