1
votes

I have a dataframe as follows:

dat <- read.table(text=
"ID | Year | Month | Variable | Value1 | Value2 | Value3
  1 | 1950 |   1   |   PRCP   |  0     |   1    |   0
  1 | 1950 |   1   |   TMAX   |  52    |   51   |   52
  1 | 1950 |   1   |   TMIN   |  41    |   41   |   39
  1 | 1950 |   2   |   PRCP   |  1     |   0    |   1
  1 | 1950 |   2   |   TMAX   |  55    |   57   |   58",
  header=TRUE, sep="|")

There are 50 Station IDs, the years span 1950-2005, months 1-12, there are 3 weather variables (PRCP, TMAX, and TMIN), and then columns Value1-Value31 for each day of the month with the weather variable measurement.

I would like to create a dataframe that looks like this:

ID | Date       | PRCP
1  | 1950-01-01 |  0
1  | 1950-01-02 |  1
1  | 1950-01-03 |  0

So far I've been able create 3 separate datasets for each weather variable, but I have no idea how to create the new columns and expand the rows accordingly (there will need to be 31 new rows for each day of the month). I'm fairly new to R and would appreciate any help - Thanks!

1
Can you make the data sample something I can copy/paste? Also, could you include some of your code so I can reproduce what you are getting? Great job on your first question!theWanderer4865

1 Answers

0
votes

We can use melt/dcast from the data.table. We convert the 'data.frame' to 'data.table' (setDT(dat)), reshape from 'wide' to 'long' format with melt, create a sequence column ('ind') grouped by 'ID', 'Year', 'Month', and 'Variable'. Create the 'Date' column by pasting 'Year', 'Month', and 'ind', and then reshape to 'wide' format using dcast. Instead of creating three separate datasets, we can have all the information in a single dataset.

library(data.table)#v1.9.6+
dM <- melt(setDT(dat), measure=patterns('^Value'))
dM1 <- dM[, ind:= 1:.N, by = .(ID, Year, Month, Variable)]
dM1[, Date:=as.Date(sprintf('%04d-%02d-%02d', Year, Month, ind))]
dcast(dM1, ID+Date~Variable, value.var='value1')
#   ID       Date PRCP TMAX TMIN
#1:  1 1950-01-01    0   52   41
#2:  1 1950-01-02    1   51   41
#3:  1 1950-01-03    0   52   39
#4:  1 1950-02-01    1   55   NA
#5:  1 1950-02-02    0   57   NA
#6:  1 1950-02-03    1   58   NA

NOTE: In the example data, the OP provided only 3 Value columns. I am guessing that in the original dataset, it would be 31 columns.

data

dat <-  structure(list(ID = c(1, 1, 1, 1, 1), Year = c(1950, 1950, 1950, 
1950, 1950), Month = c(1, 1, 1, 2, 2), Variable = c("PRCP", "TMAX", 
"TMIN", "PRCP", "TMAX"), Value1 = c(0, 52, 41, 1, 55), Value2 = c(1, 
51, 41, 0, 57), Value3 = c(0, 52, 39, 1, 58)), .Names = c("ID", 
"Year", "Month", "Variable", "Value1", "Value2", "Value3"),
row.names = c(NA, -5L), class = "data.frame")