How do I correlate 8 subsets separately against two different dependent variables? I keep getting the same correlation coefficient for the two different subsets (example below). Here is the input:
with(subset(mydata2, PARTYID_Strength = 1), cor.test(PARTYID_Strength,
mean.legit))
with(subset(mydata2, PARTYID_Strength = 1), cor.test(PARTYID_Strength,
mean.leegauthor))
with(subset(mydata2, PARTYID_Strength = 2), cor.test(PARTYID_Strength,
mean.legit))
with(subset(mydata2, PARTYID_Strength = 2), cor.test(PARTYID_Strength,
mean.leegauthor))
Output (I get this for both PARTY_Strength = 1 and 2):
Pearson's product-moment correlation
data: PARTYID_Strength and mean.legit t = 3.1005, df = 607, p-value = 0.002022 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval:
0.0458644 0.2023031 sample estimates:
cor
0.1248597Pearson's product-moment correlation
data: PARTYID_Strength and mean.leegauthor t = 2.8474, df = 607, p-value = 0.004557 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval:
0.03568431 0.19250344 sample estimates:
cor
0.1148091
Sample data:
> dput(head(mydata2, 10))
``structure(list(PARTYID = c(1, 3, 1, 1, 1, 4, 3, 1, 1, 1), PARTYID_Other =
c("NA",
"NA", "NA", "NA", "NA", "Green", "NA", "NA", "NA", "NA"), PARTYID_Strength =
c(1,
7, 1, 2, 1, 8, 1, 6, 1, 1), PARTYID_Strength_Other = c("NA",
"NA", "NA", "NA", "NA", "Green", "NA", "NA", "NA", "NA"), THERM_Dem = c(80,
65, 85, 30, 76, 15, 55, 62, 90, 95), THERM_Rep = c(1, 45, 10,
5, 14, 14, 0, 4, 10, 3), Gender = c("Female", "Male", "Male",
"Female", "Female", "Male", "Male", "Female", "Female", "Male"
), `MEAN Age` = c(29.5, 49.5, 29.5, 39.5, 29.5, 21, 39.5, 39.5,
29.5, 65), Age = c("25 - 34", "45 - 54", "25 - 34", "35 - 44",
"25 - 34", "18 - 24", "35 - 44", "35 - 44", "25 - 34", "65+"),
Ethnicity = c("White or Caucasian", "Asian or Asian American",
"White or Caucasian", "White or Caucasian", "Hispanic or Latino",
"White or Caucasian", "White or Caucasian", "White or Caucasian",
"White or Caucasian", "White or Caucasian"), Ethnicity_Other = c("NA",
"NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA"), States = c("Texas",
"Texas", "Ohio", "Texas", "Puerto Rico", "New Hampshire",
"South Carolina", "Texas", "Texas", "Texas"), Education = c("Master's
degree",
"Bachelor's degree in college (4-year)", "Bachelor's degree in college (4-
year)",
"Master's degree", "Master's degree", "Less than high school degree",
"Some college but no degree", "Master's degree", "Master's degree",
"Some college but no degree"), `MEAN Income` = c(30000, 140000,
150000, 60000, 80000, 30000, 30000, 120000, 150000, 60000
), Income = c("Less than $30,000", "$130,001 to $150,000",
"More than $150,000", "$50,001 to $70,000", "$70,001 to $90,000",
"Less than $30,000", "Less than $30,000", "$110,001 to $130,000",
"More than $150,000", "$50,001 to $70,000"), mean.partystrength = c(3.875,
2.875, 2.375, 3.5, 2.625, 3.125, 3.375, 3.125, 3.25, 3.625
), mean.traitrep = c(2.5, 2.625, 2.25, 2.625, 2.75, 1.875,
2.75, 2.875, 2.75, 3), mean.traitdem = c(2.25, 2.625, 2.375,
2.75, 2.625, 2.125, 1.875, 3, 2, 2.5), mean.leegauthor = c(1,
2, 2, 4, 1, 4, 1, 1, 1, 1), mean.legit = c(1.71428571428571,
3.28571428571429, 2.42857142857143, 2.42857142857143, 2.14285714285714,
1.28571428571429, 1.42857142857143, 1.14285714285714, 2.14285714285714,
1.28571428571429)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))``
Thank you!
==
not=
soPARTYID_Strength == 1
– dcarlsonPARTYID_Strength==1
so that variable is a constant. The correlation of that variable with any other variable is zero. If you are subsetting the data, do not use the subsetting variable in the correlation. – dcarlson