1
votes

Setup: I am using jupyter notebook, Python version 3.6.2, and Excel version 15.36

I have several excel files (each file with multiple sheets). I have loaded each excel file (code below) so that a dataframe that is an ordered dict of the sheets in the file is created for each excel file.

import pandas as pd

df1 = pd.read_excel('2014.xlsx', sheetname=None, header=1)
df2 = pd.read_excel('2015.xlsx', sheetname=None, header=1)

..etc..

Now I want to loop through each sheet of each file, add a new cell to each row of the sheet that contains the sheet name, and then merge all the sheets of all the files together into one large sheet.

I am new to python and would love to hear a clean way to execute this task.

2
do you know the sheetnames ahead of time? using sheetname=None will read the first sheet and return a dataframe. - Paul H

2 Answers

0
votes

So, for looping through all files you can use a simple for loop based on a list:

years = ['2014', '2015', '2016']
for year in years :
   df1 = pd.read_excel(year +'.xlsx')
0
votes

Don't read everything in individually. You're just going to have to repeat a bunch of commands. Setup a metadata structure for the information about your worksheets:

metadata = [
    {'book': 'book1.xlsx', 'sheets': ['sheet1', 'sheet2']},
    {'book': 'book2.xlsx', 'sheets': ['sheet1']},
    {'book': 'book3.xlsx', 'sheets': ['sheet1', 'sheet2', 'sheet3']},
    {'book': 'book4.xlsx', 'sheets': ['sheet1', 'sheet2']},
]

And the feed a nested list comprehension to pandas.concat, using the assign method to add your column to each book.

new_df = pandas.concat([
    pandas.read_excel(md['book'], sheetname=s).assign(book=md['book']
    for md in metadata for s in md['sheets']
], ignore_index=True)

That's a little terse and admittedly hard to read for less experience python/pandas users. If you were to write in out in a normal for loop, it'd look like this:

_dfs = []
for md in metadata:
    for s in md['sheets']:
        _dfs.append(
            pandas.read_excel(md['book'], sheetname=s)
                  .assign(book=md['book']
        )

df = pandas.concat(_dfs, ignore_index=True)