0
votes

How do I correlate 8 subsets separately against two different dependent variables? I keep getting the same correlation coefficient for the two different subsets (example below). Here is the input:

with(subset(mydata2, PARTYID_Strength = 1), cor.test(PARTYID_Strength,
                                                     mean.legit))

with(subset(mydata2, PARTYID_Strength = 1), cor.test(PARTYID_Strength,
                                                     mean.leegauthor))

with(subset(mydata2, PARTYID_Strength = 2), cor.test(PARTYID_Strength,
                                                     mean.legit))

with(subset(mydata2, PARTYID_Strength = 2), cor.test(PARTYID_Strength,
                                                     mean.leegauthor))

Output (I get this for both PARTY_Strength = 1 and 2):

Pearson's product-moment correlation

data: PARTYID_Strength and mean.legit t = 3.1005, df = 607, p-value = 0.002022 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval:
0.0458644 0.2023031 sample estimates:
cor
0.1248597

Pearson's product-moment correlation

data: PARTYID_Strength and mean.leegauthor t = 2.8474, df = 607, p-value = 0.004557 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval:
0.03568431 0.19250344 sample estimates:
cor
0.1148091

Sample data:

> dput(head(mydata2, 10))
``structure(list(PARTYID = c(1, 3, 1, 1, 1, 4, 3, 1, 1, 1), PARTYID_Other = 
c("NA", 
"NA", "NA", "NA", "NA", "Green", "NA", "NA", "NA", "NA"), PARTYID_Strength = 
c(1, 
7, 1, 2, 1, 8, 1, 6, 1, 1), PARTYID_Strength_Other = c("NA", 
"NA", "NA", "NA", "NA", "Green", "NA", "NA", "NA", "NA"), THERM_Dem = c(80, 
65, 85, 30, 76, 15, 55, 62, 90, 95), THERM_Rep = c(1, 45, 10, 
5, 14, 14, 0, 4, 10, 3), Gender = c("Female", "Male", "Male", 
"Female", "Female", "Male", "Male", "Female", "Female", "Male"
), `MEAN Age` = c(29.5, 49.5, 29.5, 39.5, 29.5, 21, 39.5, 39.5, 
29.5, 65), Age = c("25 - 34", "45 - 54", "25 - 34", "35 - 44", 
"25 - 34", "18 - 24", "35 - 44", "35 - 44", "25 - 34", "65+"), 
Ethnicity = c("White or Caucasian", "Asian or Asian American", 
"White or Caucasian", "White or Caucasian", "Hispanic or Latino", 
"White or Caucasian", "White or Caucasian", "White or Caucasian", 
"White or Caucasian", "White or Caucasian"), Ethnicity_Other = c("NA", 
"NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA"), States = c("Texas", 
"Texas", "Ohio", "Texas", "Puerto Rico", "New Hampshire", 
"South Carolina", "Texas", "Texas", "Texas"), Education = c("Master's 
degree", 
"Bachelor's degree in college (4-year)", "Bachelor's degree in college (4- 
 year)", 
"Master's degree", "Master's degree", "Less than high school degree", 
"Some college but no degree", "Master's degree", "Master's degree", 
"Some college but no degree"), `MEAN Income` = c(30000, 140000, 
150000, 60000, 80000, 30000, 30000, 120000, 150000, 60000
), Income = c("Less than $30,000", "$130,001 to $150,000", 
"More than $150,000", "$50,001 to $70,000", "$70,001 to $90,000", 
"Less than $30,000", "Less than $30,000", "$110,001 to $130,000", 
"More than $150,000", "$50,001 to $70,000"), mean.partystrength = c(3.875, 
2.875, 2.375, 3.5, 2.625, 3.125, 3.375, 3.125, 3.25, 3.625
), mean.traitrep = c(2.5, 2.625, 2.25, 2.625, 2.75, 1.875, 
2.75, 2.875, 2.75, 3), mean.traitdem = c(2.25, 2.625, 2.375, 
2.75, 2.625, 2.125, 1.875, 3, 2, 2.5), mean.leegauthor = c(1, 
2, 2, 4, 1, 4, 1, 1, 1, 1), mean.legit = c(1.71428571428571, 
3.28571428571429, 2.42857142857143, 2.42857142857143, 2.14285714285714, 
1.28571428571429, 1.42857142857143, 1.14285714285714, 2.14285714285714, 
1.28571428571429)), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))``

Thank you!

1
Logical statements require == not = so PARTYID_Strength == 1dcarlson
@dcarlson Thank you! although I get this result: Pearson's product-moment correlation data: PARTYID_Strength and mean.legit t = NA, df = 67, p-value = NA alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: NA NA sample estimates: cor NALisa Byers
You selected only the rows with PARTYID_Strength==1 so that variable is a constant. The correlation of that variable with any other variable is zero. If you are subsetting the data, do not use the subsetting variable in the correlation.dcarlson
@dcarlson ah I see that makes sense. So perhaps I shouldn't be measuring political parties separately but grouped together? Also, what was the original formula measuring then if I used = instead of ==?Lisa Byers
It did not do anything. R did not complain, but it just returned the original data.dcarlson

1 Answers

1
votes

To run the tests, create a vector of the columns of interest and then sapply an anonymous function to each of them.

fixed <- "PARTYID_Strength"
cols <- c("mean.leegauthor", "mean.legit")

cor_test_result <- sapply(cols, function(x){
  fmla <- paste(fixed, x, sep = "+")
  fmla <- as.formula(paste("~", fmla))
  cor.test(fmla, mydata2)
}, simplify = FALSE)

cor_test_result$mean.leegauthor
#
#        Pearson's product-moment correlation
#
#data:  PARTYID_Strength and mean.leegauthor
#t = 1.4804, df = 8, p-value = 0.177
#alternative hypothesis: true correlation is not equal to 0
#95 percent confidence interval:
# -0.2343269  0.8462610
#sample estimates:
#      cor 
#0.4637152