The generator produced by os.walk
yields three items each iteration: the path of the current directory in the walk, a list of paths representing sub directories that will be traversed next, and a list of filenames contained in the current directory.
If for whatever reason you don't want to walk certain file paths, you should remove entries from what I called sub
below (the list of sub directories contained in root
). This will prevent os.walk
from traversing any paths you removed.
My code does not prune the walk. Be sure to update this if you don't want to traverse an entire file subtree.
The following outline should work for this although I haven't been able to test this on Windows. I have no reason to think it'll behave differently.
import os
import sys
def write_files(sources, combined):
# Want the first header
with open(sources[0], 'r') as first:
combined.write(first.read())
for i in range(1, len(sources)):
with open(sources[i], 'r') as s:
# Ignore the rest of the headers
next(s, None)
for line in s:
combined.write(line)
def concatenate_csvs(root_path):
for root, sub, files in os.walk(root_path):
filenames = [os.path.join(root, filename) for filename in files
if filename.endswith('.csv')]
combined_path = os.path.join(root, 'combined.csv')
with open(combined_path, 'w+') as combined:
write_files(filenames, combined)
if __name__ == '__main__':
path = sys.argv[1]
concatenate_csvs(path)