1
votes

I have 4 .csv files that I perform wrangling on daily. I have the output of the parsed csv files saved as 4 separate .xlsx workbooks. My goal is the have all 4 workbooks combined into a single xlsx workbook, each contained in its own tab/worksheet.

I have read of so many ways to do this using openpyxl, xlwt and xlwriter amongst other people's versions and I am just confused and seeking guidance from the experts of the proper way/s to make this work for my specific application...

I create the workbook with 4 worksheets/tabs, but when I try and write into a tab using the worksheet name, it appears to be overwriting everything I did previously, and I just can't figure out how to fix it? Any help or guidance is much appreciated!

import pandas as pd
import openpyxl
import csv
from openpyxl import Workbook

# this creates an xlsx workbook with 4 worksheets
wb = Workbook()
dest_filename = 'Drop Offenderssssssss.xlsx'
ws = wb.active
ws.title = "DropCount_Offenders"
ws = wb.create_sheet()
ws.title = 'Dropstat_Offenders'
ws = wb.create_sheet()
ws.title = 'DropCountPerSec_Offenders'
ws = wb.create_sheet()
ws.title = 'numPktDrops_Offenders'
wb.save(filename = dest_filename)

# there are 2 possible filenames. this takes user input and stores it as a global date variable to call proper filename
date = str(raw_input("Enter yyyymmdd: "))
date_var = date

# function 1 for tab1 contents:
def dropcount_offenders():
    global date_var
    filename1 = 'PROBE_HEALTH_GRAPH_Drop_Count_%s-01.01.00.AM.csv' %(date_var)
    filename2 = 'PROBE_HEALTH_GRAPH_Drop_Count_%s-01.01.01.AM.csv' %(date_var)

# this trys to open the first possible filename
    try:
        file_handler = open(filename1)
    except:
        print"trying the next one"

# if first filename was not found then it locates and opens the 2nd possible filename
        try:
            file_handler = open(filename2)
        except:
            print"invalid input"

# this uses pandas library to read the csv contents into memory
    data = pd.read_csv(file_handler)
# this renames the columns (takes out spaces) 
    data.columns = ["Probe_Name", "Recording_Time", "Drop_Count"]

# this defines a filter threshold which clears all rows who's Drop_Count column data = 0 
    counts = data[data.Drop_Count >= 1].sort_index(by="Probe_Name", ascending=True)


# now I want to append/write my filtered data to a specific tab within the xlsx file
    counts.to_excel("Drop Offenderssssssss.xlsx", "DropCount_Offenders")

# function 2 for tab2 contents (overwrites all tabs I previously created and overwrites function 1 as well?):
def dropstat_offenders():
    global date_var
    filename1 = 'DropStats_%s-01.01.00.AM.csv' %(date_var)
    filename2 = 'DropStats_%s-01.01.01.AM.csv' %(date_var)

    try:
        file_handler = open(filename1)
    except:
        print"trying the next one"
        try:
            file_handler = open(filename2)
        except:
            print"invalid input"

    data = pd.read_csv(file_handler)
    data.columns = ["Probe_Name", "RecordingTime", "RecordingPeriod", "PrimaryDimension", "BladeId", "dropCount"]

# this removes the columns i dont need to see
    del data["RecordingPeriod"]
    del data["BladeId"]

    drops = data[data.dropCount >= 1].sort_index(by="Probe_Name", ascending=True)
    drops.to_excel("Drop Offenderssssssss.xlsx", 'Dropstat_Offenders')

# this runs the above 2 functions in sequence
dropcount_offenders()
dropstat_offenders()

what I want is for dropcount_offenders() to be one tab/worksheet in the workbook and for dropstat_offenders() to be another tab/worksheet, etc...?

one snipit I thought might work but had no success: Modify an existing Excel file using Openpyxl in Python

another snipit I dont understand how to use in my application: How to concatenate three excels files xlsx using python?

2
You have to write the sheets, instead of writing to the file, while creating the sheets use different names and write the required data to the appropriate sheets.Thiru

2 Answers

1
votes

I've no experience with Pandas but I you might not need it here. It's unclear whether you are modifying an existing Excel file or whether you just need to create one using the csv files. If it's the former you can only use openpyxl, if it's latter you can use either openpyxl or xlsxwriter. python-excel (xlrd and xlwt) does not support editing existing or writing Excel 2010 files.

Assuming you want to convert the csv files into Excel worksheets your code will look something like this pseudo-code.

from csv import DictReader
from openpyxl import Workbook

wb = Workbook()
del wb["Sheet"]
for title in ("DropCount_Offenders", "Dropstat_Offenders", "DropCountPerSec_Offenders", "numPktDrops_Offenders"):
  wb.create_sheet(title)


for f in filenames:
    src = DictReader(f)
    ws = wb[f]
    ws.append(["Probe_Name", "Recording_Time", "Drop_Count"])
    for row in src:
       ws.append(row["Probe_Name"], ["Recording_Time"], ["Drop_Count"])

wb.save("Drop Offenders.xlsx")

You'll need to look at the csv module in the standard library for more information.

If you do need Pandas for sorting etc. then you'll need to look at the documentation for editing existing Excel files.

NB. there is no need to use global in your functions. In Python read access to variables in a higher scope is always available. global is used to make a local variable a global one and it is something you will almost never need.

0
votes

Using XlsxWriter with Pandas

multiple workbooks into a single xlsx workbook using Python

import pandas as pd
import xlsxwriter

names = ['Sankar','Guru','Karthik','parthi','sarvanan']
grade = [88,84,89,84,81]
bscdegree = [1,1,0,0,1]
mscdegree = [2,1,0,0,0]
phddegree = [0,1,0,0,0]
bdaydates = ['04/15/1945','10/25/1976','3/30/1990','04/30/1901','1963-09-01']
department = ['CS','IT','CS&IT','ECE']
list = zip(names,bscdegree,mscdegree,phddegree,bdaydates)
list1 = zip(names,grade,department)
columns = ['Names','BSC','MSC','PHD','Bday']
columns1 = ['Names','Grades','Department']
df = pd.DataFrame(data=list, columns=columns)
df1 = pd.DataFrame(data=list1, columns=columns1)
writer = pd.ExcelWriter('grade.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1',index=False)
df1.to_excel(writer, sheet_name='sheet2',index=False)
writer.save()

http://xlsxwriter.readthedocs.io/working_with_pandas.html