2
votes

I have read in a .csv file into R in a data.frame object. This object contains a column with year numbers, one with day numbers and the remainder are daily measurements separated into different columns per month, like this:

> new_stn
   year day JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
1  1970   1   0   4   0   0   0   2   0   0   0   6  10   0
2  1970   2   0   0   0   0   0   2   0   0   6   6   4   0
3  1970   3   0   0   0   0   0  12   0   6   0  14   3   1
4  1970   4   0   4   0   1   2   0   0   0   6   3   2   0
5  1970   5   0   2   0   0   5   0   0   3   0  14   3   0
6  1970   6   0   0  12   0   3   2   0   4   3  NA   0   0
7  1970   7   1  23  13   2   0   5  10   3   0  NA   3   0
8  1970   8   7   0   0  13   3   1   1   2   0   8   2   0
9  1970   9   1   2   0   6   0   2   2   2   0   4  16   0
10 1970  10  13   0   0  36   0   8   0   4   5   0   7   3
11 1970  11   0   0   0   1   6   0   0   0   0   0  13   3
12 1970  12   6   3   0   0   0   5   0   0   0   0   0   0
13 1970  13   0   4   0   0   6   4   0   0   0   0  11   0
14 1970  14  10   0   0  10   0  10   0   0  NA   2   0   0
15 1970  15   1   0   0  18   0   0   0   2  NA   2   9   0
16 1970  16   0   5   0   6  16   1   5   2  NA   0   1   0
17 1970  17   0   1   0   0  10   0   2   0  NA   0   0   0
18 1970  18   0   4   0   0  10   0   0  16  NA   0   7   0
19 1970  19   0   1   0   0   4   0   8   0  NA   2   0   0
20 1970  20   0   0   0   0  22   3   6   0  NA   0   0   0
21 1970  21   0   1   0   0   0  14   2   2   9   0   0   0
22 1970  22   0   0   0   0   2   0   0   0   7   5   1   0
23 1970  23   0   0   0   0   0   0   0   4  24  26   2   0
24 1970  24   2   9   0   0   2   0   0   0   1   9   8   0
25 1970  25   0   0   0   0   2  11   0   0   0  NA  15   0
26 1970  26   0   7   2   2   5   2   0   2   0  NA   0  35
27 1970  27   0   6   0   1   2  NA   1   0   0  NA   0   0
28 1970  28   0   0   0   5   1  NA   7   0  18  16   0   0
29 1970  29   0  NA   0   9   0   0   0   0  32   0   9   0
30 1970  30   4  NA   0   0  16   0   6   0   4   4   0   0
31 1970  31   5  NA   0  NA   1  NA   0   0  NA  12  NA   4

How do I convert this into one continuous time-series?

My main issue is the reshaping while taking into account date properties, such as leap years, calendar days, etc. Something like this:

> ns
           obs
1970-01-01   0
1970-01-02   0
1970-01-03   0
1970-01-04   0
1970-01-05   0
1970-01-06   0
1970-01-07   1
1970-01-08   7
1970-01-09   1
1970-01-10  13

Thanks in advance and sorry for the long example.

2

2 Answers

1
votes

First, melt your data frame form wide format to long format.

library(reshape2)
df.long<-melt(new_stn,id.vars=c("year","day"),
              variable.name="month",value.name="obs")

Then add new column dat that contains dates made from columns year, day and month. Function as.Date() will produce NA if date will be unrealistic.

df.long<-transform(df.long,dat=as.Date(paste(year,day,month,sep="/"),"%Y/%d/%B"))

Remove rows that contain NA in date column.

df.long<-df.long[!is.na(df.long$dat),]
1
votes

Just for the sake of completeness a solution in utils (part of R) may use stack:

tmpStack <- stack(new_stn[3:ncol(new_stn)])
new_stn_ldf <- data.frame(
  date = as.Date(
    paste(new_stn$year, new_stn$day, tmpStack$ind, sep = "/")
    , "%Y/%d/%B"
  )
  , obs = tmpStack$values
)
new_stn_ldf <- new_stn_ldf[!is.na(new_stn_ldf$date),]