1
votes

I have several data frames in R that have similar names: "datafile_20180801", "datafile_20180802",... --> they all start with 'datafile_' followed by the date of extraction. They have the same column names, how can I stack them into one data frame in R without having to type every single name into rbind? (I'm talking about 30+ data-frames)

Thanks in advance!

1
What did you try? There are a lot of answers about that here in SO. Hint: .GlobalEnv - Sotos
@Sotos I read the individual files like this: 'filenames <- list.files(path="path", pattern="filename+.*csv") names_fic <-substr(filenames,1,32) for(i in names_fic){ filepath <- file.path("path",paste(i,".csv",sep="")) assign(i, read.delim(filepath, colClasses=c("character","factor",rep("numeric",4)), sep = "\t")) }' then I tried this to combine them: 'complete_fic <- do.call(rbind, names_fic)' but this didn't work - Kimberley
Best practices warrant not to save similar structured data frames separately with assign. Use lapply to save dataframes to a list and then call do.call(rbind,mylist). And if ever you need a single df, index your list: by number mylist[[1]] or by name mylist$df1 (if you name your list with setNames). - Parfait

1 Answers

1
votes

The question did not provide reproducible data so we use the reproducible data in the Note at the end. The question states the column names are "similar and we have assumed that means that all data frames have the same column names.

Let env define the environment where the data frames are located. Below we assume that it is the global environment. (Note that the default for env in both ls and mget is the global environment so another possibility in that case is to just omit the env argument in both instances.) List the names in that environment that match the indicated regular expression giving Names. Then get the objects themselves using mget giving the list L and bind the components of L together using rbind. The row names of the result will indicate which data frame each row came from. No packages are used.

env <- .GlobalEnv
Names <- ls(pattern = "^datafile_\\d{8}$", env)
L <- mget(Names, env)
do.call("rbind", L)

giving the following using the data shown reproducibly in the Note at the end:

                    Time demand
datafile_20180801.1    1    8.3
datafile_20180801.2    2   10.3
datafile_20180801.3    3   19.0
datafile_20180801.4    4   16.0
datafile_20180801.5    5   15.6
datafile_20180801.6    7   19.8
datafile_20180802.1   10   83.0
datafile_20180802.2   20  103.0
datafile_20180802.3   30  190.0
datafile_20180802.4   40  160.0
datafile_20180802.5   50  156.0
datafile_20180802.6   70  198.0

An alternative to the last line of code above is to use bind_rows from dplyr. Instead of creating row names that identify the source of each row it will create a new id column to do that.

library(dplyr)
bind_rows(L, .id = "id")

giving the following but with warnings (which could be avoided if we convert the factor columns to character first):

                  id Time demand
1  datafile_20180801    1    8.3
2  datafile_20180801    2   10.3
3  datafile_20180801    3   19.0
4  datafile_20180801    4   16.0
5  datafile_20180801    5   15.6
6  datafile_20180801    7   19.8
7  datafile_20180802   10   83.0
8  datafile_20180802   20  103.0
9  datafile_20180802   30  190.0
10 datafile_20180802   40  160.0
11 datafile_20180802   50  156.0
12 datafile_20180802   70  198.0

We could express the above using magrittr if desired:

library(magrittr)

.GlobalEnv %>%
   mget(ls(pattern = "^datafile_\\d{8}$", .), .) %>%
   do.call("rbind", .)

Note

We have made use of the builtin data frame BOD to create reproducible input.

datafile_20180801 <- transform(BOD, Time = factor(Time))
datafile_20180802 <- transform(BOD, Time = factor(10*Time), demand = 10 * demand)