0
votes

My dataset contains 200 companies over 8 years and I have got CO2 Emissions as a variable. I want to see if CO2 levels are decreaasing over time. I run something like cor(CO2, years) but then the correlation is very weak because the panel structure (that I have different companies) is not regarded. I tried using a panel regression with only CO2 and Years, but its not working either. Do you have any idea how to compute this kind of thing in R? Calculating a correlation inside each company-panel and then fit all values together to give me one correlation coefficient at the end?

1
So what is the goal? Correlation or whether they are decreasing over time? Also it may be useful to post a sample of your data, right now it is unclear where the issue is.user2974951

1 Answers

0
votes

I didn't understand your data perfectly, but here is my best guess at an answer. I think you have data in a long format with columns like "year", "company", "co2", and you would want to know the correlation per company.

Let's generate some example data:

n_years <- 10
n_companies <- 200

# Generate some CO2 data
co2 <- vapply(seq_len(n_companies), function(i) {
  cumsum(rnorm(n_years))
}, as.numeric(years))
colnames(co2) <- paste0("company", seq.int(ncol(co2)))

# Shaping into long format
starting_data <- reshape2::melt(co2)
colnames(starting_data) <- c("year", "company", "co2")

head(starting_data)
  year  company      co2
1    1 company1 2.076313
2    2 company1 3.481235
3    3 company1 5.089682
4    4 company1 5.237323
5    5 company1 3.199387
6    6 company1 1.600289

We would like to go back to a wide-format to easily calculate correlations, where column names are companies and rows are years.

wide <- reshape2::dcast(starting_data, years ~ company, value.var = "co2")[,-1]
wide[1:5, 1:5]
  company1  company2 company3   company4  company5
1 2.076313 0.5128075 1.203343 -0.6344231 -3.458794
2 3.481235 2.0916749 1.764760 -1.6445168 -3.967761
3 5.089682 1.2900221 1.498875 -2.8475682 -4.185798
4 5.237323 2.2348157 1.104034 -2.9786654 -5.780707
5 3.199387 2.7052902 2.711285 -4.2117059 -6.623060

Then we could easily calculate the correlation per company by looping over the columns:

cors <- apply(wide, 2, function(x) {
  cor(x, seq.int(nrow(wide)))
})
head(cors)
  company1   company2   company3   company4   company5   company6 
-0.1482829 -0.5765154  0.8813647 -0.9065915 -0.7263349 -0.7794206