3
votes

I am trying to find correlation between two separate data sets in R. The structure of my first data set is (when used print(matr1) in R):

        year  month  income  
 [1,]  "2000" "01"  "30000"
 [2,]  "2000" "02"  "12364"
 [3,]  "2000" "03"  "37485"
 [4,]  "2000" "04"  "2000"
 [5,]  "2000" "05"  "7573"

The structure of my second data set is(when used print(matr2) in R):

     month_year     value     
 [1,] "Jan 2000" "84737476"
 [2,] "Feb 2000" "39450334"
 [3,] "Mar 2000" "48384943"
 [4,] "Apr 2000" "12345678"
 [5,] "May 2000" "49595340"

Now I want to find out the correlation between these two data sets but the issue that I am having is that the format of month and year in both data sets is different. Also when I used R command cor(matr1[,"income"],matr2[,"value"]) I got the error as

Error in cor(matr1[,"income"],matr2[,"value"]) : 
  'x' must be numeric

So, my question is:

  1. How to remove the error?
  2. How to find the correlation when format of month and year is different?

Any guidance will be helpful for me as I am new to this.

1
This is a programming problem rather than a statistical one, and it has been answered many times before on Stack Overflow. Your variables are stored as a character, they should be converted to a numeric with as.numeric.nograpes
ok. So I will post this question on Stack Overflow.Jason Donnald
That's probably not a good idea. The question will likely be closed. Additionally, I told you how to do it in my comment: cor(as.numeric(matr1[,"income"])....)nograpes
This question has a reproducible example, so I don't think it will be closed on SO. Please don't cross-post, though, @JasonDonnald. We should be able to migrate your Q to SO for you.gung - Reinstate Monica
Just a note about matrices: There cannot be both character and numeric columns in the same matrix. They have to all be a single class, which is I think why you're having trouble with this. If there are any character values in the matrix, the whole matrix will be converted to character values. Try converting to data.frames if this is a big issue.Rich Scriven

1 Answers

2
votes

Working with dates is kind of a pain, IMO. But if you already know that your rows correspond (that is, the income in row i of matr1 goes with / is for the same month and year as the value in the same row of matr2), you can get a correlation quite simply with:

cor(as.numeric(matr1[,"income"]), as.numeric(matr2[,"value"]))