0
votes

I am trying to make a list using pandas before putting all data sets into 2D convolution layers.

And I was able to merge all data in the multiple excel files as a list.

However, the code only reads one chosen sheet name in the multiple excel files.

For example, I have 7 sheets in each excel file; named as 'gpascore1', 'gpascore2', 'gpascore3', 'gpascore4', 'gpascore5', 'gpascore6', 'gpascore7'.

And each sheet has 4 rows and 425 columns like

enter image description here

As shown below, you can see the code.

import os
import pandas as pd

path = os.getcwd()
files = os.listdir(path)

files_xls = [f for f in files if f[-3:] == 'xls']

df = pd.DataFrame()

for f in files_xls:
    data = pd.read_excel(f, 'gpascore1') # Read only one chosen sheet available -> 
                                           gpascore1 is a sheet name.
    df = df.append(data)                 # But there are 6 more sheets and I would like 
                                           to read data from all of the sheets

data_y = df['admit'].values
data_x = []

for i, rows in df.iterrows():
    data_x.append([rows['gre'], rows['gpa'], rows['rank']])

df=df.dropna()
df.count()

Then, I got the result as below.

enter image description here

This is because the data from the 'gpascore1' sheet in 3 excel files were merged.

But, I want to read the data of 6 more sheets in the excel files.

Could anyone help me to find out the answer, please?

Thank you

===============<Updated code & errors>==================================

Thank you for the answers and I revised the read_excel() as

 data = pd.read_excel(f, 'gpascore1') to
 data = pd.read_excel(f, sheet_name=None)

But, I have key errors like below.

enter image description here

Could you give me any suggestions for this issue, please?

Thank you

1
if you pass sheet_name = None pd.read_excel will return a a dictionary of dataframes where the key is the sheet name - Umar.H
You can loop over sheet names by creating a list - DGS
Thank you for the answer Manakin and tried to change the code, but it shows some key error issues, so I posted the updated errors. Could you check what is going on, please? - David S

1 Answers

1
votes

I actually found this question under the tag of 'tensorflow'. That's hilarious. Ok, so you want to merge all Excel sheets into one dataframe?

import os
import pandas as pd

import glob
glob.glob("C:\\your_path\\*.xlsx")

all_data = pd.DataFrame()
for f in glob.glob("C:\\your_path\\*.xlsx"):
    df = pd.read_excel(f)
    all_data = all_data.append(df,ignore_index=True)
    
type(all_data)