1
votes

I have a list of datasets. Each dataset contains one month of some data. The data span for many years, therefore I have 12 datasets for each year. This data was originally a bunch of Excel files. I have imported all the files, previously converted to .csv, following this advice, namely:

datalist <- list()
files <- list.files(pattern="\\.csv$")

for(file in files) {
    stem <- gsub("\\.csv$","",file)
    datalist[[stem]] <- read.csv(file)
}

So I end up with a list named datalist containing all my datasets.

Now, my problem is that only the file names contain the actual month and year each part of data was collected, so I would like to grab the name and year from each dataset name and impute them in two new columns for that dataframe: "Year" and "Month".

All the file names, which I kept as dataframe names, follow this structure: [month]_[year]_[...some other text], as for example "August_2012_foo_bar". So I figured I'd use regular expression to grab first the month then the year. My code stub is:

for(dataset in names(datalists)) {
    name <- dataset
    month <- strapply(name,"^([^_]*).*$")
    ...?
}

The regular expression "^([^_]*).*$" grabs whatever comes before the underscore, namely the month. I get stuck when I need to assign the grabbed month to a new column of the dataset. I have tried both with assign and cbind, without luck.

In the end I would like to vertically merge all these datasets into one.

Thanks for any help!

1

1 Answers

1
votes

You can just reference a new column and assign; R will create the column for you.

Try adding:

datalist[[stem]]$Month <- month
...

That will create a new column named "Month" and assign the month variable to it. Note that R will courteously repeat the variable you're assigning as many times as is necessary to match the existing length of the data.frame.

So the whole loop would look like:

for(file in files) {
    stem <- gsub("\\.csv$","",file)
    datalist[[stem]] <- read.csv(file)

    #parse out the month and year here
    ...

    #assign to new columns
    datalist[[stem]]$Month <- month
    datalist[[stem]]$Year <- year
}