I am hoping someone can help me clarify correlation matrices. Specifically, I was wondering about the output and - why is outputs the way it does.
My intent is to understand correlation between two categorical (unordered nominal) data. The data (below) is cleaned to create factors out of the nominal variables before utilizing other methods to get counts.
For example, I created a correlation matrix in R utilizing dummy data:
set.seed(1234)
randomCities<-c("Washington","Boston","Seattle","Portland","Oakland","Dallas","Miami")
randomYachts<-c("BigOl Yacht","Notsobig Yacht","Fancy Yacht","SuperFancy Yacht")
randomYears<-c(2019,2017,2016,2015,2018)
randomQuarters<-c(1,2,3,4)
dat1<-data.frame(city=sample(randomCities,400,replace = T),
yachts=sample(randomYachts,400,replace = T),
year = sample(randomYears,400,replace=T),
qtr = sample(randomQuarters,400,replace = T),
stringsAsFactors = F)
I then subset the data, converting the variables I want to examine to factors:
#store the vars as factors
fac.Yachts<-as.factor(dat1$yachts)
fac.City<-as.factor(dat1$city)
Using the gmodels
package, I created a a contingency table:
#Create contingency table
joint_counts = joint$t
joint_counts
y
x BigOl Yacht Fancy Yacht Notsobig Yacht
Boston 19 12 10
Dallas 12 18 15
Miami 16 16 11
Oakland 6 12 11
Portland 14 16 14
Seattle 12 19 9
Washington 13 15 16
Lastly, I creating a correlation matrix, utilizing the cor()
and Hmsic
package:
cor1<-cor(joint_counts)
#cor() function
>cor(joint_counts)
BigOl Yacht Fancy Yacht Notsobig Yacht SuperFancy Yacht
BigOl Yacht 1.000000000 -0.006586363 -0.09691724 -0.25682171
Fancy Yacht -0.006586363 1.000000000 0.14098436 0.01312562
Notsobig Yacht -0.096917240 0.140984364 1.00000000 -0.66337471
SuperFancy Yacht -0.256821708 0.013125623 -0.66337471 1.00000000
#Output from Hmsic
res2<-rcorr(as.matrix(joint_counts))
>res2$r
BigOl Yacht Fancy Yacht Notsobig Yacht SuperFancy Yacht
BigOl Yacht 1.000000000 -0.006586363 -0.09691724 -0.25682171
Fancy Yacht -0.006586363 1.000000000 0.14098436 0.01312562
Notsobig Yacht -0.096917240 0.140984364 1.00000000 -0.66337471
SuperFancy Yacht -0.256821708 0.013125623 -0.66337471 1.00000000
Now, my question is - why do correlation matrices result in this output? Meaning, my intent is to see how Yacht may be related to City, but the matrix (seems?) to tell me how the levels of Yacht are correlated.
*Note: Utilizing the created *joint variable, I somewhat get this information, however, when creating a correlation matrix from it, it seems that I am only getting the relationship between the Yachts. Am I just reading correlation matrices wrong?
joint = CrossTable(fac.City,fac.Yachts,prop.chisq = F)
$prop.row
y
x BigOl Yacht Fancy Yacht Notsobig Yacht SuperFancy Yacht
Boston 0.3275862 0.2068966 0.1724138 0.2931034
Dallas 0.2142857 0.3214286 0.2678571 0.1964286
Miami 0.2909091 0.2909091 0.2000000 0.2181818
Oakland 0.1224490 0.2448980 0.2244898 0.4081633
Portland 0.2187500 0.2500000 0.2187500 0.3125000
Seattle 0.1875000 0.2968750 0.1406250 0.3750000
Washington 0.2407407 0.2777778 0.2962963 0.1851852