My dataset contains 200 companies over 8 years and I have got CO2 Emissions as a variable. I want to see if CO2 levels are decreaasing over time. I run something like cor(CO2, years)
but then the correlation is very weak because the panel structure (that I have different companies) is not regarded.
I tried using a panel regression with only CO2 and Years, but its not working either. Do you have any idea how to compute this kind of thing in R?
Calculating a correlation inside each company-panel and then fit all values together to give me one correlation coefficient at the end?
0
votes
So what is the goal? Correlation or whether they are decreasing over time? Also it may be useful to post a sample of your data, right now it is unclear where the issue is.
– user2974951
1 Answers
0
votes
I didn't understand your data perfectly, but here is my best guess at an answer.
I think you have data in a long format with columns like "year", "company", "co2"
, and you would want to know the correlation per company.
Let's generate some example data:
n_years <- 10
n_companies <- 200
# Generate some CO2 data
co2 <- vapply(seq_len(n_companies), function(i) {
cumsum(rnorm(n_years))
}, as.numeric(years))
colnames(co2) <- paste0("company", seq.int(ncol(co2)))
# Shaping into long format
starting_data <- reshape2::melt(co2)
colnames(starting_data) <- c("year", "company", "co2")
head(starting_data)
year company co2
1 1 company1 2.076313
2 2 company1 3.481235
3 3 company1 5.089682
4 4 company1 5.237323
5 5 company1 3.199387
6 6 company1 1.600289
We would like to go back to a wide-format to easily calculate correlations, where column names are companies and rows are years.
wide <- reshape2::dcast(starting_data, years ~ company, value.var = "co2")[,-1]
wide[1:5, 1:5]
company1 company2 company3 company4 company5
1 2.076313 0.5128075 1.203343 -0.6344231 -3.458794
2 3.481235 2.0916749 1.764760 -1.6445168 -3.967761
3 5.089682 1.2900221 1.498875 -2.8475682 -4.185798
4 5.237323 2.2348157 1.104034 -2.9786654 -5.780707
5 3.199387 2.7052902 2.711285 -4.2117059 -6.623060
Then we could easily calculate the correlation per company by looping over the columns:
cors <- apply(wide, 2, function(x) {
cor(x, seq.int(nrow(wide)))
})
head(cors)
company1 company2 company3 company4 company5 company6
-0.1482829 -0.5765154 0.8813647 -0.9065915 -0.7263349 -0.7794206