1
votes

Windows OS - I've got several hundred subdirectories and each subdirectory contains 1 or more .csv files. All the files are identical in structure. I'm trying to loop through each folder and concat all the files in each subdirectory into a new file combining all the .csv files in that subdirectory.

example:

folder1 -> file1.csv, file2.csv, file3.csv -->> file1.csv, file2.csv, file3.csv, combined.csv

folder2 -> file1.csv, file2.csv -->> file1.csv, file2.csv, combined.csv

Very new to coding and getting lost in this. Tried using os.walk but completely failed.

1
it's late .. but just have a look at : pymotw.com/2/globEmmanuel BRUNET

1 Answers

0
votes

The generator produced by os.walk yields three items each iteration: the path of the current directory in the walk, a list of paths representing sub directories that will be traversed next, and a list of filenames contained in the current directory.

If for whatever reason you don't want to walk certain file paths, you should remove entries from what I called sub below (the list of sub directories contained in root). This will prevent os.walk from traversing any paths you removed.

My code does not prune the walk. Be sure to update this if you don't want to traverse an entire file subtree.

The following outline should work for this although I haven't been able to test this on Windows. I have no reason to think it'll behave differently.

import os
import sys


def write_files(sources, combined):
    # Want the first header
    with open(sources[0], 'r') as first:
        combined.write(first.read())

    for i in range(1, len(sources)):
        with open(sources[i], 'r') as s:
            # Ignore the rest of the headers
            next(s, None)
            for line in s:
                combined.write(line)


def concatenate_csvs(root_path):
    for root, sub, files in os.walk(root_path):
        filenames = [os.path.join(root, filename) for filename in files
                     if filename.endswith('.csv')]
        combined_path = os.path.join(root, 'combined.csv')
        with open(combined_path, 'w+') as combined:
            write_files(filenames, combined)


if __name__ == '__main__':
    path = sys.argv[1]
    concatenate_csvs(path)