I have 4 .csv files that I perform wrangling on daily. I have the output of the parsed csv files saved as 4 separate .xlsx workbooks. My goal is the have all 4 workbooks combined into a single xlsx workbook, each contained in its own tab/worksheet.
I have read of so many ways to do this using openpyxl, xlwt and xlwriter amongst other people's versions and I am just confused and seeking guidance from the experts of the proper way/s to make this work for my specific application...
I create the workbook with 4 worksheets/tabs, but when I try and write into a tab using the worksheet name, it appears to be overwriting everything I did previously, and I just can't figure out how to fix it? Any help or guidance is much appreciated!
import pandas as pd
import openpyxl
import csv
from openpyxl import Workbook
# this creates an xlsx workbook with 4 worksheets
wb = Workbook()
dest_filename = 'Drop Offenderssssssss.xlsx'
ws = wb.active
ws.title = "DropCount_Offenders"
ws = wb.create_sheet()
ws.title = 'Dropstat_Offenders'
ws = wb.create_sheet()
ws.title = 'DropCountPerSec_Offenders'
ws = wb.create_sheet()
ws.title = 'numPktDrops_Offenders'
wb.save(filename = dest_filename)
# there are 2 possible filenames. this takes user input and stores it as a global date variable to call proper filename
date = str(raw_input("Enter yyyymmdd: "))
date_var = date
# function 1 for tab1 contents:
def dropcount_offenders():
global date_var
filename1 = 'PROBE_HEALTH_GRAPH_Drop_Count_%s-01.01.00.AM.csv' %(date_var)
filename2 = 'PROBE_HEALTH_GRAPH_Drop_Count_%s-01.01.01.AM.csv' %(date_var)
# this trys to open the first possible filename
try:
file_handler = open(filename1)
except:
print"trying the next one"
# if first filename was not found then it locates and opens the 2nd possible filename
try:
file_handler = open(filename2)
except:
print"invalid input"
# this uses pandas library to read the csv contents into memory
data = pd.read_csv(file_handler)
# this renames the columns (takes out spaces)
data.columns = ["Probe_Name", "Recording_Time", "Drop_Count"]
# this defines a filter threshold which clears all rows who's Drop_Count column data = 0
counts = data[data.Drop_Count >= 1].sort_index(by="Probe_Name", ascending=True)
# now I want to append/write my filtered data to a specific tab within the xlsx file
counts.to_excel("Drop Offenderssssssss.xlsx", "DropCount_Offenders")
# function 2 for tab2 contents (overwrites all tabs I previously created and overwrites function 1 as well?):
def dropstat_offenders():
global date_var
filename1 = 'DropStats_%s-01.01.00.AM.csv' %(date_var)
filename2 = 'DropStats_%s-01.01.01.AM.csv' %(date_var)
try:
file_handler = open(filename1)
except:
print"trying the next one"
try:
file_handler = open(filename2)
except:
print"invalid input"
data = pd.read_csv(file_handler)
data.columns = ["Probe_Name", "RecordingTime", "RecordingPeriod", "PrimaryDimension", "BladeId", "dropCount"]
# this removes the columns i dont need to see
del data["RecordingPeriod"]
del data["BladeId"]
drops = data[data.dropCount >= 1].sort_index(by="Probe_Name", ascending=True)
drops.to_excel("Drop Offenderssssssss.xlsx", 'Dropstat_Offenders')
# this runs the above 2 functions in sequence
dropcount_offenders()
dropstat_offenders()
what I want is for dropcount_offenders() to be one tab/worksheet in the workbook and for dropstat_offenders() to be another tab/worksheet, etc...?
one snipit I thought might work but had no success: Modify an existing Excel file using Openpyxl in Python
another snipit I dont understand how to use in my application: How to concatenate three excels files xlsx using python?