Requirement: I have a folder with multiple csv files. I need to perform following:
- Scan the input folder for all csv files (file1.csv, file2.csv ..... filen.csv etc) & perform below steps
- Open the first csv file (file1.csv) & store the file header in a list & then copy the entire content (including header) of file1.csv & put in combined.csv => Then move file1.csv to a /done folder under the parent folder
- Move to next file file2.csv => Compare the header of file2.csv with the header of the first file to ensure they match exactly. If the headers match then copy contents (excluding header) of file2.csv & put in combined.csv => Then move file2.csv to a /done folder under the parent folder. If the header of file2.csv do not match then exclude combining this file, leave it in the same parent folder & move to next file for combining
I have placed sample files on the link gdrive folder with sample CSV files I am open to any solution either with CSV or pandas as long as it does what I want
As a starting point, I was initially working on comparing the header as below which works. However, I am unsure on how to move further
Code:
import csv
def compare_two_csv_headers(csv_file1, csv_file2):
with open(csv_file1, newline='') as f:
reader = csv.reader(f)
frow1 = next(reader) # gets the first line
print(frow1)
with open(csv_file2, newline='') as f:
reader = csv.reader(f)
frow2 = next(reader) # gets the first line
print(frow2)
if frow1==frow2:
print('Same header')
else:
print('Different header')
csv_file1 = 'D:/2009/cm01JAN2009bhav.csv'
csv_file2 = 'D:/2009/cm01DEC2009bhav.csv'
compare_two_csv_headers(csv_file1, csv_file2)
Here are the top 10 lines from the first csv file
SYMBOL,SERIES,OPEN,HIGH,LOW,CLOSE,LAST,PREVCLOSE,TOTTRDQTY,TOTTRDVAL,TIMESTAMP,
20MICRONS,EQ,46.5,47,45.7,46.05,46,46.55,7092,328975.25,31-DEC-2009,
3IINFOTECH,EQ,85.8,86.7,84.5,85.15,85.35,85.05,2423812,207760480.3,31-DEC-2009,
3MINDIA,EQ,1855.05,1879.9,1855.05,1865.75,1874.95,1850.45,85,158679.1,31-DEC-2009,
AARTIDRUGS,EQ,107.4,108.75,103.65,104.45,104.9,106.05,84012,8929759.4,31-DEC-2009,
AARTIIND,EQ,51,51.9,48.9,49.2,49.1,50.45,149365,7517110.3,31-DEC-2009,
AARVEEDEN,EQ,64,64.5,63.05,63.85,63.1,62.7,2172,138651.5,31-DEC-2009,
ABAN,EQ,1265,1297,1265,1283.65,1283.2,1260.05,1381290,1773221519.75,31-DEC-2009,
ABB,EQ,756.2,770.85,756.2,767.1,769.55,756.3,292376,223660807.4,31-DEC-2009,
ABCIL,EQ,85.4,89,84.9,86.85,86.95,84.7,59183,5170993.2,31-DEC-2009,