3
votes

I need some help from python programmers to solve the issue I'm facing in processing data:-

  • I have .csv files placed in a directory structure like this:-

    -MainDirectory

    • Sub directory 1
      • sub directory 1A
        • fil.csv
    • Sub directory 2
      • sub directory 2A
        • file.csv
    • sub directory 3
      • sub directory 3A
        • file.csv

    Instead of going into each directory and accessing the .csv files, I want to run a script that can combine the data of the all the sub directories.

Each file has the same type of header. And I need to maintain 1 big .csv file with one header only and all the .csv file data can be appended one after the other.

I have the python script that can combine all the files in a single file but only when those files are placed in one folder.

Can you help to provide a script that can handle the above directory structure?

3
Since you have got the script that can work if there is only one folder, I think all you need now is fetching all the csv files in the tree, right?zhangyangyu
yes.....i just need to put them in one single folder but the files under different directories are with the same name. So i need to change the names before I put them in a single folder. And I don't want to manually change the names one by one.user2159674

3 Answers

3
votes

Try this code, I tested it on my laptop,it works well!

import sys
import os

def mergeCSV(srcDir,destCSV):
    with open(destCSV,'w') as destFile:
        header=''
        for root,dirs,files in os.walk(srcDir):
            for f in files:
                if f.endswith(".csv"):
                    with open(os.path.join(root,f),'r') as csvfile:
                        if header=='':
                            header=csvfile.readline()
                            destFile.write(header)
                        else:
                            csvfile.readline()
                        for line in csvfile:
                            destFile.write(line)          

if __name__ == '__main__':
    mergeCSV('D:/csv','D:/csv/merged.csv')
0
votes

You don't have to put all the files in one folder. When you do something with the files, all you need is the path to the file. So gathering all the csv files' paths and the perform the combination.

    import os 
    csvfiles = []
    def Test1(rootDir):
        list_dirs = os.walk(rootDir) 
        for root, dirs, files in list_dirs:      
            for f in files:
                if f.endswith('.csv'):
                    csvfiles.append(os.path.join(root, f))
0
votes

you can use os.listdir() to get list of files in directory