0
votes

I have seen this question asked before, but they don't quite help in tackling my problem.

I am looking to bind (rbind) multiple text files from 150 sub folders, but I ONLY am interested in 2 different files in each folder. Here are details:

  • Main folder called "Folder" which contains 150 subfolders
  • In each subfolder, I want to pull out the files that start with 1 and 11. (i.e. 1_HDx1.txt and 11_HDx1.txt) - there are 2 files of these in each folder
  • Create a column in the data frame that has the name of the subfolder the files were pulled from.

I know how to do this if all files were in one folder (see directly below), but can I do something similar for a folder of folders??

z <- NULL
files <- dir("Folder")
for (file in files) {
  x <- read.csv(file.path("Folder", file), as.is=TRUE)
  x$source <- substring(file, 8, 10) #name of file for the source
 z <- rbind(z, x)]) 
}

File structure for first 3 subfolders out of 150:

Folder
- WSTNUM_001
  - 1_HDx
  - 2_LDx
  - 3_LD
  - 4_LD
  - 5_ld
  .....
  - 11_Urbanx
  - 12_Urbany
- WSTNUM_002
  - 1_HDx
  - 2_LDx
  - 3_LD
  - 4_LD
  - 5_ld
  .....
  - 11_Urbanx
  - 12_Urbany
- WSTNUM_003
  - 1_HDx
  - 2_LDx
  - 3_LD
  - 4_LD
  - 5_ld
  .....
  - 11_Urbanx
  - 12_Urbany`
1

1 Answers

0
votes

I would suggest doing this with a sequence of apply functions.

Assuming you are starting with the folder full directory as the working directory:

Folders <- list.files() # creates a list of all the folders

# Looks in each folder and returns a path to all files starting with 1 or 11.
Paths <- lapply(Folders,function(x){
  F <- list.files(x)
  F <- F[grepl("^(1|11)",F)]
  paste0(x,'/',F)
})
Paths <- unlist(Paths)

# Reads each of the selected files into a list.
Tables <- lapply(Paths, function(Path){
  read.csv(Path, as.is=TRUE)
})

# Rbinds the list together
Data <- do.call(rbind,Tables)