I have a list of datasets. Each dataset contains one month of some data. The data span for many years, therefore I have 12 datasets for each year. This data was originally a bunch of Excel files. I have imported all the files, previously converted to .csv, following this advice, namely:
datalist <- list()
files <- list.files(pattern="\\.csv$")
for(file in files) {
stem <- gsub("\\.csv$","",file)
datalist[[stem]] <- read.csv(file)
}
So I end up with a list named datalist
containing all my datasets.
Now, my problem is that only the file names contain the actual month and year each part of data was collected, so I would like to grab the name and year from each dataset name and impute them in two new columns for that dataframe: "Year" and "Month".
All the file names, which I kept as dataframe names, follow this structure: [month]_[year]_[...some other text], as for example "August_2012_foo_bar". So I figured I'd use regular expression to grab first the month then the year. My code stub is:
for(dataset in names(datalists)) {
name <- dataset
month <- strapply(name,"^([^_]*).*$")
...?
}
The regular expression "^([^_]*).*$"
grabs whatever comes before the underscore, namely the month. I get stuck when I need to assign the grabbed month to a new column of the dataset. I have tried both with assign
and cbind
, without luck.
In the end I would like to vertically merge all these datasets into one.
Thanks for any help!