reading excel to a python data frame starting from row 5 and including headers

Question

how do I import excel data into a dataframe in python.

Basically the current excel workbook runs some vba on opening which refreshes a pivot table and does some other stuff.

Then I wish to import the results of the pivot table refresh into a dataframe in python for further analysis.

import xlrd

wb = xlrd.open_workbook('C:\Users\cb\Machine_Learning\cMap_Joins.xlsm')

#sheetnames
print wb.sheet_names()

#number of sheets
print wb.nsheets

The refreshing and opening of the file works fine. But how do i select the data from the first sheet from say row 5 including header down to last record n.

Thanks Andy. I couldn't get the pd.ExcelFile to work with a .xlsm file and couldn't seem to install openpyxl which is supposed to work with .xlsx and .xlsm files. I just changed my vba to save as .xls. everything worked then except the skip_rows=4 part but i just removed the unwanted rows later in python with df2 = df[2:] — IcemanBerlin
Oh. Actually just adjusted skip_rows=4 with skiprows = 4 and that part worked too. great thanks. You get a silver star...gold if the .xlsm file had worked ;) — IcemanBerlin
Whoops my bad! Will add an issue about xlsm files on github. — Andy Hayden

Andy Hayden Andy Hayden · Accepted Answer · 2013-07-09T13:02:34

You can use pandas' ExcelFile parse method to read Excel sheets, see io docs:

xls = pd.ExcelFile('C:\Users\cb\Machine_Learning\cMap_Joins.xlsm')

df = xls.parse('Sheet1', skiprows=4, index_col=None, na_values=['NA'])

skiprows will ignore the first 4 rows (i.e. start at row index 4), and several other options.

reading excel to a python data frame starting from row 5 and including headers

2 Answers