0
votes

So, I have a lot of files in xlsx, and I need a few sheets from each file. The structure is something like this:

3 sheets with final grades, like a report card. the first one is global score, the second is from the first thing we evaluate, call it "a"; and the third one is about the "b" score. The global score is an index between "a" and "b".

The files don't have the same number of rows. For example, we evaluate "1", "2", "3", "4", "5" and "6". But, many files don't have "2" and "5"; others don't have "5", and very few ones have all the things we evaluate. In the "1" sheet someone fills the cells with an evaluation of multiple criteria: "it shows all the information?" "The information is a csv file?" and put a score.

In each file we have a hidden sheet with the calculus of "a", and another one with the calculus of "b" for each of the numbers: i.e. a1, b1, a2, b2, a3, b3 and so on. I know, sound messy.

And additional to this we have other two hide sheets with the summary version of a1, b1, and etcetera.

I need the summary versions.

So, I extract the names of the rows we evaluate (i.e. 1, 2, 4, 6) and the names of sheets in R studio (global index, a index, b index, a1, b1, summary a1, summary a2, etc.) and create a list for each thing with map.

so...

I got a dir_ls(folder) where I have all the xlsx files a list of data frames I create with purrr::map() with the names of the sheets I need for each xlsx file. Here, is a list, each element of the list is a data frame, in each data frame every row is the name of the sheet i need to extract.

I tried this, where da is the list of dataframes with the names of the sheets:

read_excel(dir_ls(inp)[1], sheet = map(da, ~ as.character(.x)))

map(dir_ls(inp), ~
         read_excel(.x, sheet = map(da, ~
                                      as.character(.y))))

This one works for 1 file with 1 sheet name:

try <- da[["2_Eva_23_AGATAN.xlsx"]]

read_excel(paste(inp, list.files(inp), sep = "/")[1], sheet = as.character(try[1,1]))

But, there are 146 files and 1,348 sheet names I need.

EDIT: I try this:

read_excel(dir_ls(inp)[[1]], sheet = map_chr(try, ~ .))

and get the following message

Error: Result 1 must be a single string, not a character vector of length 10

Hope that can help.

I keep thinking about this is a problem of different vector lengths with purrr.

EDIT 2:

I have figured it!

So, I keep thinking about two vectors of different lengths, and I decided to make a data frame with the path of the Excel file and the sheet name.

With a unique data frame, I use a map2.

map2(df$path, df$sheet, ~ read_excel(path = .x, sheet = .y))

Now I have to filter and clean the data frames (more than 1,000).

Writing this helps me a lot.

1

1 Answers

0
votes

I have figured it!

So, I keep thinking about two vectors of different lengths, and I decided to make a data frame with the path of the Excel file and the sheet name.

With a unique data frame, I use a map2.

map2(df$path, df$sheet, ~ read_excel(path = .x, sheet = .y))

Now I have to filter and clean the data frames (more than 1,000).