9
votes

I retrieved a list of csv files with purrr::map and got a large list.

  csv_files <- list.files(path = data_path, pattern = '\\.csv$', full.names = TRUE)
  all_csv <- purrr::map(csv_files, readr::read_csv2)
  names(all_csv) <- gsub(data_path, "", csv_files)
  return all_csv

EDITED as suggested by @Spacedman

I further need to process each tibble/data frame separately within the process_csv_data function.

purrr::map(all_csv, process_csv_data)

How to retrieve the name of a single item in the large list without for loop?

2
Like names(all_csv)[42] for example?Spacedman
Also, use basename(csv_files) to get the file name part of the path. gsub fails if data_path is ".", which it was when I tried this.Spacedman
@Spacedman Is it the reason for the downvote? As I said, I'm avoiding a for loop and therefore I shouldn't have an index to use the bracket operator [.Yann
I think you should say within the process_csv_data function for clarity.Spacedman

2 Answers

16
votes

Use map2, as in this reproducible example:

> L = list(a=1:10, b=1:5, c=1:6)
> map2(L, names(L), function(x,y){message("x is ",x," y is ",y)})
x is 12345678910 y is a
x is 12345 y is b
x is 123456 y is c

the output of the list as x in the function gets a bit munged by message, but its the list element of L.

5
votes

You can take advantage of purrr to keep all the data in a single, nested tibble. That way each csv and processed csv remains linked directly with the appropriate csv-name:

csv_files <- list.files(path = data_path, pattern = '\\.csv$', full.names = TRUE)

all_csv <- tibble(csv_files) %>% 
    mutate(data = map(csv_files, read_csv2),
    processed = map(data, process_csv_data),
    csv_files = gsub(data_path, "", csv_files)) %>%
    select(-data)