1
votes

I have one data frame which has sales values for the time period Oct. 2000 to Dec. 2001 (15 months). Also I have profit values for the same time period as above and I want to find the correlation between these two data frames month wise for these 15 months in R. My data frame sales is:

 Month       sales
Oct. 2000   24.1                                        
Nov. 2000   23.3    
Dec. 2000   43.9    
Jan. 2001   53.8    
Feb. 2001   74.9    
Mar. 2001   25  
Apr. 2001   48.5    
May. 2001   18  
Jun. 2001   68.1    
Jul. 2001   78  
Aug. 2001   48.8    
Sep. 2001   48.9    
Oct. 2001   34.3    
Nov. 2001   54.1    
Dec. 2001   29.3

My second data frame profit is:

 period     profit
Oct 2000    14.1                                        
Nov 2000    3.3 
Dec 2000    13.9    
Jan 2001    23.8    
Feb 2001    44.9    
Mar 2001    15  
Apr 2001    58.5    
May 2001    18  
Jun 2001    58.1    
Jul 2001    38  
Aug 2001    28.8    
Sep 2001    18.9    
Oct 2001    24.3    
Nov 2001    24.1    
Dec 2001    19.3

Now I know that for initial two months I cannot get the correlation as there are not enough values but from Dec 2000 onwards I want to calculate the correlation by taking into consideration the previous months values. So, for Dec. 200 I will consider values of Oct. 2000, Nov. 2000 and Dec. 2000 which will give me 3 sales value and 3 profit values. Similarly for Jan. 2001 I will consider values of Oct. 2000, Nov. 2000 Dec. 2000 and Jan. 2001 thus having 4 sales value and 4 profit value. Thus for every month I will consider previous month values also to calculate the correlation and my output should be something like this:

Month        Correlation
Oct. 2000    NA or Empty
Nov. 2000    NA or Empty
Dec. 2000       x
Jan. 2001       y
    .           .
    .           .
Dec. 2001       a

I know that in R there is a function cor(sales, profit) but how can I find out the correlation for my scenario?

2
So basically you want to loop cor(sales[1:i], profit[1:i]) for increasing i?Spacedman
@Spacedman will that give me correlation for each month of each year by taking into consideration the previous months values also? Sorry for asking a bit not trivial question but I am new to this and hence do not have much knowledge.user2966197
@user2966197 Please explain your problem in clear statistical terms. Could you possibly want to calculate the cross-correlation (see ?ccf)?Roland
@Roland I want to calculate correlation between sales and profit month wise like for Oct 2000 Nov. 2000 Dec 2001 uptill Dec 2001. For the first two months(Oct 2000 &` Nov. 2000) I cannot get a correlation as there is only 1 or 2 values on each side of sales and profit. But for Dec 2000` onwards I can get the correlation as I will be considering previous month values also thus giving 3 values on each for Dec 2000. So for each month I will consider values of all previous months.user2966197

2 Answers

0
votes

Make some sample data:

> sales = c(1,4,3,2,3,4,5,6,7,6,7,5)
> profit = c(4,3,2,3,4,5,6,7,7,7,6,5)
> data = data.frame(sales=sales,profit=profit)
> head(data)
  sales profit
1     1      4
2     4      3
3     3      2
4     2      3
5     3      4
6     4      5

Here's the beef:

> data$runcor = c(NA,NA, 
    sapply(3:nrow(data), 
       function(i){
          cor(data$sales[1:i],data$profit[1:i])
        }))
> data
   sales profit      runcor
1      1      4          NA
2      4      3          NA
3      3      2 -0.65465367
4      2      3 -0.63245553
5      3      4 -0.41931393
6      4      5  0.08155909
7      5      6  0.47368421
8      6      7  0.69388867
9      7      7  0.78317543
10     6      7  0.81256816
11     7      6  0.80386072
12     5      5  0.80155885

So now data$runcor[3] is the correlation of the first 3 sales and profit numbers.

Note I call this runcor as its a "running correlation", like a "running sum" which is the sum of all elements so far. This is the correlation of all pairs so far.

0
votes

Another possibility would be: (if dat1 and dat2 are the initial datasets)

Update

dat1$Month <- gsub("\\.", "", dat1$Month)
datN <- merge(dat1, dat2, sort=FALSE, by.x="Month", by.y="period")

indx <- sequence(3:nrow(datN)) #create index to replicate the rows
indx1 <- cumsum(c(TRUE,diff(indx) <0)) #create another index to group the rows

#calculate the correlation grouped by `indx1` 
 datN$runcor <- setNames(c(NA, NA,by(datN[indx,-1], 
       list(indx1), FUN=function(x) cor(x$sales, x$profit) )), NULL)

datN
#      Month sales profit    runcor
#1  Oct 2000  24.1   14.1        NA
#2  Nov 2000  23.3    3.3        NA
#3  Dec 2000  43.9   13.9 0.5155911
#4  Jan 2001  53.8   23.8 0.8148546
#5  Feb 2001  74.9   44.9 0.9345166
#6  Mar 2001  25.0   15.0 0.9119941
#7  Apr 2001  48.5   58.5 0.7056301
#8  May 2001  18.0   18.0 0.6879528
#9  Jun 2001  68.1   58.1 0.7647177
#10 Jul 2001  78.0   38.0 0.7357748
#11 Aug 2001  48.8   28.8 0.7351366
#12 Sep 2001  48.9   18.9 0.7190413
#13 Oct 2001  34.3   24.3 0.7175138
#14 Nov 2001  54.1   24.1 0.7041889
#15 Dec 2001  29.3   19.3 0.7094334