I am in the process of reorganizing a large weather dataset. I am trying to attach a replicated character string to a list so that the repeated string appears before each element of the list.
For example, imagine a table containing monthly temperature and precipitation (nedbor
) data over time, in two separate cities (K and S
). It is currently structured such that each row represents a year ranging from 2000 to 2015 and there is a column for each weather variable for each month. This makes for a very wide table (which I want).
The problem is that the dataframe was constructed from 12 .csv files
, each named after the month of the data it represents, as well as two separate vectors that describe a different variable across years (NAO). The output table from
> Weather<-data.frame(Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec,NAO,NAOPrevYr)
yields a table with 16 rows (one for each year 2000-2015
) and 170 columns structured so that these columns:
(Year, Month, S.HighTemp, S.LowTemp, S.MeanTemp, S.Nedbor, S.Nedbordage, K.Year, K.Month, K.HighTemp, K.LowTemp, K.MeanTemp,K.Nedbor,K.Nedbordage)
are associated with each month (14*12=168) and two additional columns (NAO and NAOLastYear) sit at the end. Entries in the Month column are obviously repeated for the entirety of their respective month. However, because each source file contains the same column names, the column names in the Weather dataframe are followed by ".1" for the February segment of columns, ".2" for March, etc.
I want to rename the columns so that the generic descriptor (eg, "S.HighTemp") is followed by a period and then the month with which it is associated. The desired output is still a table with 16 rows and 170 columns, except that rather than the August section of columns reading
(Year.7, Month.7, S.HighTemp.7, S.LowTemp.7, S.MeanTemp.7, S.Nedbor.7, S.Nedbordage.7, K.Year.7, K.Month.7, K.HighTemp.7, K.LowTemp.7, K.MeanTemp.7,K.Nedbor.7,K.Nedbordage.7)
I want it to read
(Year.Aug, Month.Aug, S.HighTemp.Aug, S.LowTemp.Aug, S.MeanTemp.Aug, S.Nedbor.Aug, S.Nedbordage.Aug, K.Year.Aug, K.Month.Aug, K.HighTemp.Aug, K.LowTemp.Aug, K.MeanTemp.Aug,K.Nedbor.Aug,K.Nedbordage.Aug)
and act similarly for each of the 14-variable monthly blocks.
What I tried:
names(Weather)<-c(c("Year","Month","S.HighTemp","S.LowTemp","S.MeanTemp",
"S.Nedbor","S.Nedbordage","K.Year","K.Month",
"K.HighTemp","K.LowTemp","K.MeanTemp","K.Nedbor",
"K.Nedbordage")+c(rep(".Jan",times=14)),
c("Year","Month","S.HighTemp","S.LowTemp","S.MeanTemp",
"S.Nedbor","S.Nedbordage","K.Year","K.Month",
"K.HighTemp","K.LowTemp","K.MeanTemp","K.Nedbor",
"K.Nedbordage")+c(rep(".Feb",times=14)),
c("Year","Month","S.HighTemp","S.LowTemp","S.MeanTemp",
"S.Nedbor","S.Nedbordage","K.Year","K.Month",
"K.HighTemp","K.LowTemp","K.MeanTemp","K.Nedbor",
"K.Nedbordage")+c(rep(".Mar",times=14)),
c("Year","Month","S.HighTemp","S.LowTemp","S.MeanTemp",
"S.Nedbor","S.Nedbordage","K.Year","K.Month",
"K.HighTemp","K.LowTemp","K.MeanTemp","K.Nedbor",
"K.Nedbordage")+c(rep(".Apr",times=14)),
c("Year","Month","S.HighTemp","S.LowTemp","S.MeanTemp",
"S.Nedbor","S.Nedbordage","K.Year","K.Month",
"K.HighTemp","K.LowTemp","K.MeanTemp","K.Nedbor",
"K.Nedbordage")+c(rep(".May",times=14)),
c("Year","Month","S.HighTemp","S.LowTemp","S.MeanTemp",
"S.Nedbor","S.Nedbordage","K.Year","K.Month",
"K.HighTemp","K.LowTemp","K.MeanTemp","K.Nedbor",
"K.Nedbordage")+c(rep(".Jun",times=14)),
c("Year","Month","S.HighTemp","S.LowTemp","S.MeanTemp",
"S.Nedbor","S.Nedbordage","K.Year","K.Month",
"K.HighTemp","K.LowTemp","K.MeanTemp","K.Nedbor",
"K.Nedbordage")+c(rep(".Jul",times=14)),
c("Year","Month","S.HighTemp","S.LowTemp","S.MeanTemp",
"S.Nedbor","S.Nedbordage","K.Year","K.Month",
"K.HighTemp","K.LowTemp","K.MeanTemp","K.Nedbor",
"K.Nedbordage")+c(rep(".Aug",times=14)),
c("Year","Month","S.HighTemp","S.LowTemp","S.MeanTemp",
"S.Nedbor","S.Nedbordage","K.Year","K.Month",
"K.HighTemp","K.LowTemp","K.MeanTemp","K.Nedbor",
"K.Nedbordage")+c(rep(".Sep",times=14)),
c("Year","Month","S.HighTemp","S.LowTemp","S.MeanTemp",
"S.Nedbor","S.Nedbordage","K.Year","K.Month",
"K.HighTemp","K.LowTemp","K.MeanTemp","K.Nedbor",
"K.Nedbordage")+c(rep(".Oct",times=14)),
c("Year","Month","S.HighTemp","S.LowTemp","S.MeanTemp",
"S.Nedbor","S.Nedbordage","K.Year","K.Month",
"K.HighTemp","K.LowTemp","K.MeanTemp","K.Nedbor",
"K.Nedbordage")+c(rep(".Nov",times=14)),
c("Year","Month","S.HighTemp","S.LowTemp","S.MeanTemp",
"S.Nedbor","S.Nedbordage","K.Year","K.Month",
"K.HighTemp","K.LowTemp","K.MeanTemp","K.Nedbor",
"K.Nedbordage")+c(rep(".Dec",times=14)),
NAO, NAOPrevYr)
Unfortunately this gives me an error indicating I'm trying to apply the non-numeric argument to a binary operator. I'm assuming this is because I combined a "+" with vectors of character strings.
I searched for information related to merging character strings. The related material I found online is largely too linear in its design for what I'm trying to do.
For example,
R Programming: Automating Merge of Character Strings adds character strings together into a vector of strings. But I want to merge strings across vectors, almost like taking two adjacent columns of variables and months, and removing the divide of the cell between then (the list would then be in a top-to-bottom order
).
Merging vectors of strings in a list in R , is really just a rearrangement of entries in a vector. And
How to merge vectors into a list in R? still claims to be merging vectors but really seems to just be appending vectors.
Basically I'm pretty new to this and still figuring the whole R thing out. If you have any ideas for what more I can look up please let me know. There has got to be a better way of doing this...