7
votes

How could I retrieve

  1. the column names (values of the cells in the first row) in an openpyxl Read-only worksheet?
    • City, Population, Country in the below example worksheet
  2. all column names in an openpyxl Read-only workbook?
    • City, Population, Country, frames from worksheet 1 and the other column names from all other worksheets

Example Excel worksheet:

| City       | Population  |    Country   |
| -----------|------------ | ------------ |
| Madison    |   252,551   |     USA      |
| Bengaluru  | 10,178,000  |    India     |
| ...        |       ...   |     ...      |

Example code:

from openpyxl import load_workbook

wb = load_workbook(filename=large_file.xlsx, read_only=True)
sheet = wb.worksheets[0]

... (not sure where to go from here)

Notes:

  • I have to use readonly because the Excel file has over 1 million rows (don't ask)
  • I'd like the column names so I can eventually infer the column types and import the excel data into a PostgreSQL database
4
Your use of the terms "headers" is ambiguous. - Charlie Clark
Good point, edited the question. - Ty Hitzeman
You're still talking about print_titles which are something different. As are headers and footers. - Charlie Clark
Edited the question again. - Ty Hitzeman
So, what's the question now? [c.value for c in ws.iter_rows(min_row=1, max_row=1)] not sufficient? - Charlie Clark

4 Answers

10
votes

This will print every thing from row 1;

list_with_values=[]
for cell in ws[1]:
    list_with_values.append(cell.value)

If for some reason you want to get a list of the column letters that are filled in you can just:

column_list = [cell.column for cell in ws[1]]

For your 2nd question; Assuming you have stored the header values in a list called : "list_with_values"

from openpyxl import Workbook
wb = Workbook()
ws = wb['Sheet']
#Sheet is the default sheet name, you can rename it or create additional ones with wb.create_sheet()
ws.append(list_with_values)
wb.save('OutPut.xlsx')
4
votes

Read-only mode provides fast access to any row or set of rows in a worksheet. Use the method iter_rows() to restric the selection. So to get the first row of the worksheet:

rows = ws.iter_rows(min_row=1, max_row=1) # returns a generator of rows
first_row = next(rows) # get the first row
headings = [c.value for c in first_row] # extract the values from the cells
1
votes

Charlie Clarks answer compacted down to a one liner with list comprehension

    headers = [c.value for c in next(wb['sheet_name'].iter_rows(min_row=1, max_row=1))]
0
votes

This is how I handled this

from openpyxl.utils import get_column_letter

def get_columns_from_worksheet(ws):
  return {
      cell.value: {
          'letter': get_column_letter(cell.column),
          'number': cell.column - 1
      } for cell in ws[1] if cell.value
  }

An Example of this being used would be

from openpyxl import load_workbook

wb = load_workbook(filename='my_file.xlsx')
ws = wb['MySheet']

COLUMNS = get_columns_from_worksheet(ws)

for cell in ws[COLUMNS['MY Named Column']['letter']]:
    print(cell.value)

The main reason for capturing both the letter and number code is because different functions and patterns within openpyxl use either the number or the letter so having reference to both is invaluable