0
votes

I have this excel file with multiple sheet. One sheet contains two pivot tables, normal table based on the data from pivot, some graphs based on pivot as well.

I am updating the sheets without pivots using below code. The content for these sheets are generated as dataframes and straight away right the data frame.

Method 1

book = xl.load_workbook(fn)
writer = pd.ExcelWriter(fn,engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
DF.to_excel(writer, 'ABC', header=None, startrow=book.active.max_row)
writer.save()

But, when the file is written, the pivot table is converted to plain text. The solution I found to preserve the pivot table is to read and write the workbook using below methods.

Method 2

workbook = load_workbook(filename=updating_file)
sheet = workbook["Pivot"]
pivot = sheet._pivots[0]
# any will do as they share the same cache
pivot.cache.refreshOnLoad = True
workbook.save(filename=updating_file)

This adds an additional row to the pivot table as 'Value' which ruins the values of the tables based on the pivot. enter image description here

According to here using pd.ExcelWriter would not preserve pivot tables. The only example I found to update an existing excel file with data frame requires pandas ExcelWriter.

Some help would be highly appreciated, as I am unable to find a method to fulfill both requirements.

Only option I can see so far is to write the data parts with Pandas. Then, drop the existing Pivot sheet and copy a sheet from original fie. But, again I have to find a way to clear the table based on the pivot and rewrite with openpyxl using 2nd method. (We can't copy sheets between workbooks)

2

2 Answers

1
votes

Stick with your Method 1: if you convert the df to a pivot table in pandas, and then export to excel, it will work.

An example:

import pandas as pd 
import numpy as np 

# create dataframe 
df = pd.DataFrame({'A': ['John', 'Boby', 'Mina', 'Peter', 'Nicky'], 
      'B': ['Masters', 'Graduate', 'Graduate', 'Masters', 'Graduate'], 
      'C': [27, 23, 21, 23, 24]}) 

table = pd.pivot_table(df, values ='A', index =['B', 'C'], 
                         columns =['B'], aggfunc = np.sum)

table.to_excel("filename.xlsx")

Outputs

Excel Output

0
votes

I found a way to iterate the data frame as rows. If it was adding rows to the end of exisitng table, this would have been much easier. Since, I have to insert rows to middle, I followed below approach to insert blank rows and write the cell values.

current_sheet.insert_rows(idx=11, amount=len(backend_report_df))
sheet_row_idx = 11
is_valid_row = False
for row in dataframe_to_rows(backend_report_df, index=True, header=True):
    is_valid_row = False
    for col_idx in range (0, len(row)):
        if col_idx == 0 and row[col_idx] is None:
            logger.info("Header row/blank row")
            break
        else:
            is_valid_row = True
            if col_idx != 0:
                current_sheet.cell(row=sheet_row_idx, column=col_idx).value = row[col_idx]
    if is_valid_row:
        sheet_row_idx = sheet_row_idx + 1