0
votes

So I have 1500 Excel Workbooks, each with 10+ sheets with exactly same structure. I tried to combine Multiple Excel Workbooks into one file and succeeded using code below:

import os
import pandas as pd
cwd = os.path.abspath('') 
files = os.listdir(cwd)  
df = pd.DataFrame()
for file in files:
    if file.endswith('.xlsx'):
        df = df.append(pd.read_excel(file), ignore_index=True) 
df.head() 
df.to_excel('Combined_Excels.xlsx')

1pic And as a result I get combined Excel Workbook, but only with 1st sheets(pages) of each Excel Workbook. I figured that I need to add a parameter sheet_name=None, but this way my dataframe becomes a dict type and when appended, the result is not quite what I wanted.

import os
import pandas as pd
cwd = os.path.abspath('') 
files = os.listdir(cwd)  
df = pd.DataFrame()
for file in files:
    if file.endswith('.xlsx'):
        df = df.append(pd.read_excel(file, sheet_name=None), ignore_index=True) 
df.head() 
df.to_excel('Combined_Excels.xlsx')

2pic Have you guys had similar problems? How can I at least combine all sheets into one? Combining Multiple Excel Sheets would not be a problem imo as the first sample worked perfectly.

Thanks, Nurbek

1
I am about to do exactly the same project. Check out this answerRichieV
Shouldn't you define the Sheet every time you load a dataset, so that you will finally get all Sheets from every Workbook? Irrespective of the dict issue, with Sheet=None you will not achieve thisIoaTzimas

1 Answers

0
votes

You might find a better and more efficient ways to solve this problem, but this is how I did it:

import os
import pandas as pd


# First, combine all the pages in each Workbook into one sheet
cwd = os.path.abspath('')
files = os.listdir(cwd)
df_toAppend = pd.DataFrame()
for file in files:
    if file.endswith('.xlsx'):
        df_toAppend = pd.concat(pd.read_excel(file, sheet_name=None), ignore_index=True)
        df_toAppend.to_excel(file)


# And then append all the Workbooks into single Excel Workbook sheet

cwd_2 = os.path.abspath('') 
files_2 = os.listdir(cwd_2)  
df_toCombine = pd.DataFrame()
for file_2 in files_2:
    if file_2.endswith('.xlsx'):
        df_toCombine = df_toCombine.append(pd.read_excel(file_2), ignore_index=True) 
        df_toCombine.to_excel('Combined_Excels.xlsx')

With Large sets of data, it may take considerable amount of time to combine. Hope this helps someone eventually.