11
votes

I have a large amount of EXCEL files (i.e. 200) I would like to copy one specific worksheet from one workbook to another one. I have done some investigations and I couldn't find a way of doing it with Openpyxl

This is the code I have developed so far

def copy_sheet_to_different_EXCEL(path_EXCEL_read,Sheet_name_to_copy,path_EXCEL_Save,Sheet_new_name):
''' Function used to copy one EXCEL sheet into another file.

    def path_EXCEL_read,Sheet_name_to_copy,path_EXCEL_Save,Sheet_new_name


Input data:
    1.) path_EXCEL_read: the location of the EXCEL file along with the name where the information is going to be saved
    2.) Sheet_name_to_copy= The name of the EXCEL sheet to copy
    3.) path_EXCEL_Save: The path of the EXCEL file where the sheet is going to be copied
    3.) Sheet_new_name: The name of the new EXCEL sheet

Output data:
    1.) Status= If 0, everything went OK. If 1, one error occurred.

Version History:
1.0 (2017-02-20): Initial version.

'''

status=0

if(path_EXCEL_read.endswith('.xls')==1): 
    print('ERROR - EXCEL xls file format is not supported by openpyxl. Please, convert the file to an XLSX format')
    status=1
    return status

try:
   wb = openpyxl.load_workbook(path_EXCEL_read,read_only=True)
except:
    print('ERROR - EXCEL file does not exist in the following location:\n  {0}'.format(path_EXCEL_read))
    status=1
    return status


Sheet_names=wb.get_sheet_names()    # We copare against the sheet name we would like to cpy

if ((Sheet_name_to_copy in Sheet_names)==0):
    print('ERROR - EXCEL sheet does not exist'.format(Sheet_name_to_copy))
    status=1
    return status   

# We checking if the destination file exists


if (os.path.exists(path_EXCEL_Save)==1):
    #If true, file exist so we open it

    if(path_EXCEL_Save.endswith('.xls')==1): 
        print('ERROR - Destination EXCEL xls file format is not supported by openpyxl. Please, convert the file to an XLSX format')
        status=1
    return status

    try:
        wdestiny = openpyxl.load_workbook(path_EXCEL_Save)
    except:
        print('ERROR - Destination EXCEL file does not exist in the following location:\n  {0}'.format(path_EXCEL_read))
        status=1
    return status

    #we check if the destination sheet exists. If so, we will delete it

    destination_list_sheets = wdestiny.get_sheet_names()

    if((Sheet_new_name in destination_list_sheets) ==True):
        print('WARNING - Sheet "{0}" exists in: {1}. It will be deleted!'.format(Sheet_new_name,path_EXCEL_Save))
        wdestiny.remove_sheet(Sheet_new_name) 

else:
    wdestiny=openpyxl.Workbook()
# We copy the Excel sheet

try:
    sheet_to_copy = wb.get_sheet_by_name(Sheet_name_to_copy) 
    target = wdestiny.copy_worksheet(sheet_to_copy)
    target.title=Sheet_new_name
except:
    print('ERROR - Could not copy the EXCEL sheet. Check the file')
    status=1
    return status

try:
    wdestiny.save(path_EXCEL_Save)
except:
    print('ERROR - Could not save the EXCEL sheet. Check the file permissions')
    status=1
    return status

#Program finishes
return status    

Any suggestions?

Cheers

7

7 Answers

5
votes

You cannot use copy_worksheet() to copy between workbooks because it depends on global constants that may vary between workbooks. The only safe and reliable way to proceed is to go row-by-row and cell-by-cell.

You might want to read the discussions about this feature

4
votes

I had the same problem. For me style, format, and layout were very important. Moreover, I did not want to copy formulas but only the value (of the formulas). After a lot of trail, error, and stackoverflow I came up with the following functions. It may look a bit intimidating but the code copies a sheet from one Excel file to another (possibly existing file) while preserving:

  1. font and color of text
  2. filled color of cells
  3. merged cells
  4. comment and hyperlinks
  5. format of the cell value
  6. the width of every row and column
  7. whether or not row and column are hidden
  8. frozen rows

It is useful when you want to gather sheets from many workbooks and bind them into one workbook. I copied most attributes but there might be a few more. In that case you can use this script as a jumping off point to add more.

###############
## Copy a sheet with style, format, layout, ect. from one Excel file to another Excel file
## Please add the ..path\\+\\file..  and  ..sheet_name.. according to your desire.

import openpyxl
from copy import copy

def copy_sheet(source_sheet, target_sheet):
    copy_cells(source_sheet, target_sheet)  # copy all the cel values and styles
    copy_sheet_attributes(source_sheet, target_sheet)


def copy_sheet_attributes(source_sheet, target_sheet):
    target_sheet.sheet_format = copy(source_sheet.sheet_format)
    target_sheet.sheet_properties = copy(source_sheet.sheet_properties)
    target_sheet.merged_cells = copy(source_sheet.merged_cells)
    target_sheet.page_margins = copy(source_sheet.page_margins)
    target_sheet.freeze_panes = copy(source_sheet.freeze_panes)

    # set row dimensions
    # So you cannot copy the row_dimensions attribute. Does not work (because of meta data in the attribute I think). So we copy every row's row_dimensions. That seems to work.
    for rn in range(len(source_sheet.row_dimensions)):
        target_sheet.row_dimensions[rn] = copy(source_sheet.row_dimensions[rn])

    if source_sheet.sheet_format.defaultColWidth is None:
        print('Unable to copy default column wide')
    else:
        target_sheet.sheet_format.defaultColWidth = copy(source_sheet.sheet_format.defaultColWidth)

    # set specific column width and hidden property
    # we cannot copy the entire column_dimensions attribute so we copy selected attributes
    for key, value in source_sheet.column_dimensions.items():
        target_sheet.column_dimensions[key].min = copy(source_sheet.column_dimensions[key].min)   # Excel actually groups multiple columns under 1 key. Use the min max attribute to also group the columns in the targetSheet
        target_sheet.column_dimensions[key].max = copy(source_sheet.column_dimensions[key].max)  # https://stackguides.com/questions/36417278/openpyxl-can-not-read-consecutive-hidden-columns discussed the issue. Note that this is also the case for the width, not onl;y the hidden property
        target_sheet.column_dimensions[key].width = copy(source_sheet.column_dimensions[key].width) # set width for every column
        target_sheet.column_dimensions[key].hidden = copy(source_sheet.column_dimensions[key].hidden)


def copy_cells(source_sheet, target_sheet):
    for (row, col), source_cell in source_sheet._cells.items():
        target_cell = target_sheet.cell(column=col, row=row)

        target_cell._value = source_cell._value
        target_cell.data_type = source_cell.data_type

        if source_cell.has_style:
            target_cell.font = copy(source_cell.font)
            target_cell.border = copy(source_cell.border)
            target_cell.fill = copy(source_cell.fill)
            target_cell.number_format = copy(source_cell.number_format)
            target_cell.protection = copy(source_cell.protection)
            target_cell.alignment = copy(source_cell.alignment)

        if source_cell.hyperlink:
            target_cell._hyperlink = copy(source_cell.hyperlink)

        if source_cell.comment:
            target_cell.comment = copy(source_cell.comment)


wb_target = openpyxl.Workbook()
target_sheet = wb_target.create_sheet(..sheet_name..)

wb_source = openpyxl.load_workbook(..path\\+\\file_name.., data_only=True)
source_sheet = wb_source[..sheet_name..]

copy_sheet(source_sheet, target_sheet)

if 'Sheet' in wb_target.sheetnames:  # remove default sheet
    wb_target.remove(wb_target['Sheet'])

wb_target.save('out.xlsx')
2
votes

I had a similar requirement to collate data from multiple workbooks into one workbook. As there are no inbuilt methods available in openpyxl.

I created the below script to do the job for me.

Note: In my usecase all worbooks contain data in same format.

from openpyxl import load_workbook
import os


# The below method is used to read data from an active worksheet and store it in memory.
def reader(file):
    global path
    abs_file = os.path.join(path, file)
    wb_sheet = load_workbook(abs_file).active
    rows = []
    # min_row is set to 2, to ignore the first row which contains the headers
    for row in wb_sheet.iter_rows(min_row=2):
        row_data = []
        for cell in row:
            row_data.append(cell.value)
        # custom column data I am adding, not needed for typical use cases
        row_data.append(file[17:-6])
        # Creating a list of lists, where each list contain a typical row's data
        rows.append(row_data)
    return rows


if __name__ == '__main__':
    # Folder in which my source excel sheets are present
    path = r'C:\Users\tom\Desktop\Qt'
    # To get the list of excel files
    files = os.listdir(path)
    for file in files:
        rows = reader(file)
        # below mentioned file name should be already created
        book = load_workbook('new.xlsx')
        sheet = book.active
        for row in rows:
            sheet.append(row)
        book.save('new.xlsx')
1
votes

I've just found this question. A good workaround, as mentioned here, could consists in modifying the original wb in memory and then saving it with another name. For example:

import openpyxl

# your starting wb with 2 Sheets: Sheet1 and Sheet2
wb = openpyxl.load_workbook('old.xlsx')

sheets = wb.sheetnames # ['Sheet1', 'Sheet2']

for s in sheets:

    if s != 'Sheet2':
        sheet_name = wb.get_sheet_by_name(s)
        wb.remove_sheet(sheet_name)

# your final wb with just Sheet1
wb.save('new.xlsx')
1
votes

My workaround goes like this:

You have a template file let's say it's "template.xlsx". You open it, make changes to it as needed, save it as a new file, close the file. Repeat as needed. Just make sure to keep a copy of the original template while testing/messing around.

0
votes

A workaround I use is saving the current sheet as a pandas data frame and loading it to the excel workbook you need

0
votes

For speed I am using data_only and read_only attributes when opening my workbooks. Also iter_rows() is really fast, too.

@Oscar's excellent answer needs some changes to support ReadOnlyWorksheet and EmptyCell

# Copy a sheet with style, format, layout, ect. from one Excel file to another Excel file
# Please add the ..path\\+\\file..  and  ..sheet_name.. according to your desire.
import openpyxl
from copy import copy


def copy_sheet(source_sheet, target_sheet):
    copy_cells(source_sheet, target_sheet)  # copy all the cel values and styles
    copy_sheet_attributes(source_sheet, target_sheet)


def copy_sheet_attributes(source_sheet, target_sheet):
    if isinstance(source_sheet, openpyxl.worksheet._read_only.ReadOnlyWorksheet):
        return
    target_sheet.sheet_format = copy(source_sheet.sheet_format)
    target_sheet.sheet_properties = copy(source_sheet.sheet_properties)
    target_sheet.merged_cells = copy(source_sheet.merged_cells)
    target_sheet.page_margins = copy(source_sheet.page_margins)
    target_sheet.freeze_panes = copy(source_sheet.freeze_panes)

    # set row dimensions
    # So you cannot copy the row_dimensions attribute. Does not work (because of meta data in the attribute I think). So we copy every row's row_dimensions. That seems to work.
    for rn in range(len(source_sheet.row_dimensions)):
        target_sheet.row_dimensions[rn] = copy(source_sheet.row_dimensions[rn])

    if source_sheet.sheet_format.defaultColWidth is None:
        print('Unable to copy default column wide')
    else:
        target_sheet.sheet_format.defaultColWidth = copy(source_sheet.sheet_format.defaultColWidth)

    # set specific column width and hidden property
    # we cannot copy the entire column_dimensions attribute so we copy selected attributes
    for key, value in source_sheet.column_dimensions.items():
        target_sheet.column_dimensions[key].min = copy(source_sheet.column_dimensions[key].min)   # Excel actually groups multiple columns under 1 key. Use the min max attribute to also group the columns in the targetSheet
        target_sheet.column_dimensions[key].max = copy(source_sheet.column_dimensions[key].max)  # https://stackguides.com/questions/36417278/openpyxl-can-not-read-consecutive-hidden-columns discussed the issue. Note that this is also the case for the width, not onl;y the hidden property
        target_sheet.column_dimensions[key].width = copy(source_sheet.column_dimensions[key].width) # set width for every column
        target_sheet.column_dimensions[key].hidden = copy(source_sheet.column_dimensions[key].hidden)


def copy_cells(source_sheet, target_sheet):
    for r, row in enumerate(source_sheet.iter_rows()):
        for c, cell in enumerate(row):
            source_cell = cell
            if isinstance(source_cell, openpyxl.cell.read_only.EmptyCell):
                continue
            target_cell = target_sheet.cell(column=c+1, row=r+1)

            target_cell._value = source_cell._value
            target_cell.data_type = source_cell.data_type

            if source_cell.has_style:
                target_cell.font = copy(source_cell.font)
                target_cell.border = copy(source_cell.border)
                target_cell.fill = copy(source_cell.fill)
                target_cell.number_format = copy(source_cell.number_format)
                target_cell.protection = copy(source_cell.protection)
                target_cell.alignment = copy(source_cell.alignment)

            if not isinstance(source_cell, openpyxl.cell.ReadOnlyCell) and source_cell.hyperlink:
                target_cell._hyperlink = copy(source_cell.hyperlink)

            if not isinstance(source_cell, openpyxl.cell.ReadOnlyCell) and source_cell.comment:
                target_cell.comment = copy(source_cell.comment)

With a usage something like

    wb = Workbook()
    
    wb_source = load_workbook(filename, data_only=True, read_only=True)
    for sheetname in wb_source.sheetnames:
        source_sheet = wb_source[sheetname]
        ws = wb.create_sheet("Orig_" + sheetname)
        copy_sheet(source_sheet, ws)

    wb.save(new_filename)