3
votes

I have an excel workbook with 49 sheets (all same columns, different data). I want to load this data into a database, by creating one csv file from the 49 sheets. My plan is to read and concatenate each sheet into a "master" dataframe. Then I will write this dataframe a csv file, which I can load into a database (probably mySQL).

I have successfully read the first 3 sheets, one at a time, as follows:

xls = pd.ExcelFile('Pages 1-49.xlsx')
xls.sheet_names
pdf_1 = pd.read_excel(xls, sheet_name='page 1',skiprows=1)
pdf.append(pdf_next)

I execute this code 2 more times (not shown here) and have 3 dataframes named pdf_1, pdf_2, and pdf 3.

I then concatenate these three dataframes as follows:

voters = pd.concat([pdf_1, pdf_2, pdf_3],ignore_index='false') 

Everything is "working" - I have a single dataframe - but I have a total of 150 worksheets. How do I go from one-at-a-time to a loop?

I want one dataframe containing 150 worksheets of data.

Probably a simple task but I've spent several long days to get this far. I'd really appreciate some help.

2

2 Answers

4
votes

you need to use a "for loop". May I suggest something like the following:

df_list=[]
xls = pd.ExcelFile('Pages 1-49.xlsx')

for x in range(150):
    df = pd.read_excel(xls, sheet_name='page {}'.format(x+1),skiprows=1)
    df_list.append(df)


voters = pd.concat(df_list,ignore_index='false') 

alternatively, the for loop can be replaced like the below if xls.sheet_names is a list.

df_list=[]
for sheet_name in xls.sheet_names:
    df = pd.read_excel(xls, sheet_name=sheet_name,skiprows=1)
    df_list.append(df)
3
votes

You can try:

# this give you a dictionary of dataframe
# {sheet_name: sheet_df}
dfs = pd.read_excel('play.xlsx', sheet_name=None)
pd.concat(dfs, sort=False)

With my toy xlsx:

            A    B    C
Sheet1 0  1.0  2.0  3.0
       1  4.0  5.0  6.0
Sheet2 0  1.0  2.0  3.0
       1  4.0  5.0  6.0