0
votes

I'm trying to determine how much data is missing from a large excel sheet. The following code takes a prohibitive amount of time to complete. I've seen similar questions, but I'm not sure how to translate the answer to this case. Any help would be appreciated!

import openpyxl

wb = openpyxl.load_workbook('C://Users/Alec/Documents/Vertnet master list.xlsx', read_only = True)
sheet = wb.active

lat = 0
loc = 0
ele = 0

a = openpyxl.utils.cell.column_index_from_string('CF')
b = openpyxl.utils.cell.column_index_from_string('BU')
c = openpyxl.utils.cell.column_index_from_string('BX')

print('Workbook loaded')

for x in range(2, sheet.max_row):
    if sheet.cell(row = x, column = a).value:
        lat += 1
    if sheet.cell(row = x, column = b).value:
        loc += 1
    if sheet.cell(row = x, column = c).value:
        ele += 1
    print((x/sheet.max_row) * 100, '%')
print('Latitude: ', lat/sheet.max_row)
print('Location', loc/sheet.max_row)
print('Elevation', ele/sheet.max_row)
2
Are you trying to find the cells left on the whole sheet (sooo many) or just the cells left in a table you have on the sheet (significantly less)? - Aidan

2 Answers

0
votes

If you are simply trying to do the calc on a table on the sheet and not the entire sheet, you could make one adjustment to make it faster.

row = 1
Do Until IsEmpty(range("A1").offset(row,1).value)
     if range("B"&row).value: lat += 1 
     if range("C"&row).value: loc += 1 
     if range("D"&row).value: ele += 1 
     row = row + 1
     Loop

This would take you to the end of your defined table rather than the end of the whole sheet which is 90% of the reason it's taking you so long.

Hope this helps

0
votes

Your problem is that, despite advice in the documentation to the contrary, you're using your own counters to access cells. In read-only mode each use of ws.cell() will force the worksheet to reparse the XML source for the worksheet. Simply use ws.iter_rows(min_col=a, max_col=c) to get the cells in the columns you're interested in.