11
votes

I have a data frame of a monthly data for 100 yrs (1200 data points) with the months in columns and years in the rows. I want to convert it into a monthly time series and I have tried several ways, none of which create the correct "temporal" structure.

The problem lies with R considering the data frame as a 100 observations (years) of 12 variables (the months). Here is a reproducible code for my latest try:

set.seed(12)
dummy.df <- as.data.frame(matrix(round(rnorm(1200),digits=2),nrow=100,ncol=12))
rownames(dummy.df) <- seq(from=1901, to=2000)
colnames(dummy.df) <- c("jan","feb","mar","apr","may","jun","jul","aug","sep","oct","nov","dec")
dummy.df.ts <- ts(as.vector(as.matrix(dummy.df)), start=c(1901,1), end=c(2000,12), frequency=12)

In the "dummy.df.ts" object, the rows and columns are switched and instead of sequential observations in columns, all the januarys februarys etc are stacked together one after the other. How can I get to the correct temporal structure?

An example of my data: these are monthly temperature values from 1901 - 1905

fr.monthly.temp.sample  

     JAN FEB MAR  APR  MAY  JUN  JUL  AUG  SEP  OCT NOV DEC  
1901 2.7 0.4 4.7 10.0 13.0 16.9 19.2 18.3 15.7 10.6 4.9 3.5  
1902 4.1 3.2 7.5 10.3 10.0 15.1 18.2 17.4 15.0 10.2 6.3 3.5  
1903 3.8 5.9 7.6  7.1 12.9 14.9 17.6 17.3 15.5 12.1 6.9 2.7  
1904 3.0 4.6 5.5 10.3 13.6 16.3 20.2 18.5 13.9 11.2 5.4 4.8  
1905 1.7 4.0 7.4  9.3 11.9 16.5 20.0 17.6 14.7  8.4 5.5 3.8  

Plot of original time series

And by using this ts() call:

fr.monthly.temp.sample.ts <- ts(as.vector(as.matrix(fr.monthly.temp.sample)),                              start=c(1901,1), end=c(1905,12), frequency=12)

This is the output I get for the time series object:

fr.monthly.temp.sample.ts  

      Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec  
1901  2.7  4.1  3.8  3.0  1.7  0.4  3.2  5.9  4.6  4.0  4.7  7.5  
1902  7.6  5.5  7.4 10.0 10.3  7.1 10.3  9.3 13.0 10.0 12.9 13.6  
1903 11.9 16.9 15.1 14.9 16.3 16.5 19.2 18.2 17.6 20.2 20.0 18.3  
1904 17.4 17.3 18.5 17.6 15.7 15.0 15.5 13.9 14.7 10.6 10.2 12.1  
1905 11.2  8.4  4.9  6.3  6.9  5.4  5.5  3.5  3.5  2.7  4.8  3.8  

Plot of time series from R code

--Note the changed temporal structure (values from the columns are now in the rows..)--

Thanks.

1
I've edited my answer. I don't actually experience this "switching" between cols and rows in the data displayed by plot. Probably you can get better answers using a sample of your data, just a couple of years, and posting the chart you get.Michele
i've added my original data and plots to show the wrong ordering of values. Your EDIT 1 solution gives me a multivariate time series which is not what I want, i want to keep it as a univariate series for further processing..as for EDIT 2, i tried as.vector() for the same reason, but that is the operation that is causing the ordering issue. perhaps solution #1 posted by @Alexander will work, i still have to try that..avg
My edit 1 wasn't a solution... it was just to make aware about how ts() works. My edit 2 started with the data ALREADY in a vector... I didn't realise that you just needed a transpose... round(seq(5,10,length.out=24),1) is a vector.Michele

1 Answers

8
votes

Solution 1

You could transpose (function t()) the matrix before vectorizing it:

set.seed(12)
dummy.df <- as.data.frame(matrix(round(rnorm(1200), digits = 2),
                                 nrow = 100, ncol = 12))
rownames(dummy.df) <- seq(1901, 2000)
colnames(dummy.df) <- month.abb
dummy.df.ts <- ts(as.vector(t(as.matrix(dummy.df))), 
                  start=c(1901,1), end=c(2000,12), frequency=12)

Solution 2

You could melt the data, order by date, then apply the ts() function.

Here's the data setup. If your language setting is English you could save some code by using month.abb, but that is not robust to other language locales.

set.seed(12)
dummy.df <- as.data.frame(matrix(round(rnorm(1200),digits=2),nrow=100,ncol=12))
months <- format(seq.Date(as.Date("2013-01-01"), as.Date("2013-12-01"), 
                          by = "month"), format = "%b")
colnames(dummy.df) <- months
dummy.df$Year <- seq(1901, 2000) # set as variable, not as rownames 

Melt the data so you have a data frame with 1200 rows, each representing an observation:

library("reshape2")
dummy.df <- melt(dummy.df, id.vars = "Year")

Order the observations by date:

dummy.df$Date <- as.Date(paste(dummy.df$Year, dummy.df$variable, "01", sep = "-"),
                         format = ("%Y-%b-%d"))
dummy.df <- dummy.df[order(dummy.df$Date), ]

Then you can apply a similar ts() call, with the ts object showing the desired order:

dummy.df.ts <- ts(dummy.df$value, start=c(1901,1), end=c(2000,12), frequency=12)