0
votes

I need to create a dataframe summarising information relating to file checking.

I have a list of 126 unique combinations of climate scenarios and years (e.g. 'ssp126_2030', 'ssp126_2050', 'ssp145_2030', 'ssp245_2050'). These unique elements represent sections of a larger full file path pointing to a specific file (scenario_list, below). For each unique element, I need to create multiple new columns specifying whether the file exists, its size and the date it was created.

I would like to loop through the list of 126 elements and stitch together a table of file checks (file_check_table, below). I start with a table of sub-directories, I then split these strings into sections so I can paste0() together a string that points to the file within the sub-directory that I want to check. I am aiming to use mutate()/transmutate() and purrr::map() to loop through each element in the climate scenario list and add multiple file checking columns (see below image of table).

I am new to functional programming, and this is what I have tried so far I was thinking of creating a function to add new columns, and then apply the function to list of climate scenarios. My end goal is to have one new column for each climate scenario and type of file check:

file_checks <- function(x) {
                       dir_list %>%
                       mutate(file_check_table,!!paste0(new_col_name) := ifelse(file.exists(paste0(file))==TRUE,1,0))}

file_check_table <- map(scenario_list, file_checks(x))

However, this function does not work as I don't think I have written the function correctly or perhaps used purrr correctly. Any thought on how to fix this would be much appreciated, thank you. This is what I would like file_check_table

example file check table

1
Your function argument is x, but it is not used inside. Also, in the map, if you don't want to use anonymous function, then just file_checks will work. Without a small reproducible example, it is not able to testakrun
Ok thank you. Yeah I was thinking of how to create a reproducible example, but as I am dealing with files on my local computer using file.exist() I couldn't think of a way that could replicate that. I will have to think of another way to replicate that.CarlaBirdy
I think that is clear in the post, but I have italicized for clarity. Scenario_list is a list of unique values that I want to loop through. dir_list is the list of sub-directories and file_check_table is the output file I am wanting to makeCarlaBirdy
I am still confused how the final expected output would look and what exactly are you checking. What would be in the first column and what would be the column names in file_check_table. Can you show expected output with some 4-5 rows and 4-5 columns?Ronak Shah
I have added a picture of what I intend it to look like. The first column in the sub-directory path and subsequent columns are the various file checks for different climate scenarios.CarlaBirdy

1 Answers

1
votes

If I understand your question correctly, you have a scenario_list that describes the path to the files, and would like the characteristics of the files. The natural way to do that would be to run a pipe with one entry per row, no reason to put it in a function.

For example:

library(tidyverse)

scenario_list <- read_lines("scenario_list.txt")
root_dir <- "C:/USers/Documents/my_project/data_subdir"

file_table <- tibble(scenario = scenario_list) %>%
  mutate(path = file.path(root_dir, paste0(scenario, ".csv")),
         exists = file.exists(path),
         full_info = file.info(path),
         file_size = full_info$size,
         file_date = full_info$mtime)

And then if you want the output on a single row as in your screenshot:

file_table %>%
  select(-path, -full_info) %>%
  pivot_wider(names_from = scenario,
              names_glue = "{scenario}_{.value}",
              values_from = !scenario) %>%
  write_csv("output.csv")