0
votes

We're trying to make an automatic program, that can take multiple excel files with multiple sheets from a folder, and append them to one data frame.

Our problem is that we're not quite sure how to do this, so the process becomes most automatic. And since the sheets varies in names, we can't specify any variable for them.

Alle of the files are *.xlsx, and the code has to load a arbitrary number of files.

We have tried with different types of codes, primarily using pandas, but we can't seem to append them in one data frame.

import numpy as np
import pandas as pd
import glob
all_data = pd.DataFrame()
for f in glob.glob("*.xlsx"):
    df = pd.read_excel(f)
    all_data = all_data.append(df, ignore_index=True)

# now save the data frame
writer = pd.ExcelWriter('output.xlsx')
all_data.to_excel(writer)
writer.save() 

sheet1 = xls.parse(0)

We expect to have one data frame with all data, such that we can use data and extract different features and make statistics.

1

1 Answers

0
votes

The documentation of pandas.read_excel states:

*sheet_name : str, int, list, or None, default 0 Strings are used for sheet names. Integers are used in zero-indexed sheet positions. Lists of strings/integers are used to request multiple sheets. Specify None to get all sheets.

Available cases:

  • Defaults to 0: 1st sheet as a DataFrame
  • 1: 2nd sheet as a DataFrame
  • "Sheet1": Load sheet with name “Sheet1”
  • [0, 1, "Sheet5"]: Load first, second and sheet named “Sheet5” as a dict of DataFrame None: All sheets.*

I would suggest to try the last option, being pd.read_excel(f,sheet_name = None). Otherwise you might want to create a loop and pass indexes vs. the actual sheet names, this way you don't have to have prior knowledge of the .xlsx files.