I have a directory with excel files that I'm looping through and reading a sheet from each into a Pandas dataframe. Each file contains a month of data (example name = "Savings January 2019.xlsx"). There is not a column in the excel sheet for a date, so I'd like to add a column for "Date" to the dataframe and read each file by month and year in the workbook name (ex. "January 2019") and add 'MM-DD-YYYY' (ex. '01-01-2019' as the date value for each row read in.
Below is my working loop to read 12 excel workbooks in without dates, producing only all 12 month totals. I need the dates so I can visualize the data by month.
df_total = pd.DataFrame()
for file in files: # loop through Excel files (each file adds date value based on file name)
if file.endswith('.xlsx'):
excel_file = pd.ExcelFile(file)
sheets = excel_file.sheet_names
for sheet in sheets: # loop through sheets inside an Excel file
df = excel_file.parse(sheet_name = "Group Savings")
df_total = df_total.append(df)
Current df:
State Group Value
0 Illinois 000000130 470.93
1 Illinois 000000130 948.33
2 Illinois 000000784 3498.42
3 Illinois 000000784 16808.16
4 Illinois 000002077 7.00
Need df:
State Group Date Value
0 Illinois 000000130 01-01-2019 470.93
1 Illinois 000000130 01-01-2019 948.33
2 Illinois 000000784 01-01-2019 3498.42
3 Illinois 000000784 02-01-2019 6808.16
4 Illinois 000002077 02-01-2019 7.00
I've done a little research and think it's something like creating the column then adding the date value but can't figure out how to parse the filename to do so and am clearly a beginner here.
for sheet in sheets: # loop through sheets inside an Excel file
df = excel_file.parse(sheet_name = "Group Savings")
df_total = df_total.append(df)
df_total['Date'] = #if excel_file contains 'January 2019', then df_total['Date'] == '01-01-2019