0
votes

i'm trying to unzip & copy files in a couple of sub-directories to a destination directory. Here is my code.

import zipfile,fnmatch,os

rootPath = r"C:\\Temp\\Test\\source"
pattern = '*.zip'
for root, dirs, files in os.walk(rootPath):
    for filename in fnmatch.filter(files, pattern):
        zip_ref = zipfile.ZipFile(os.path.join(root, filename))
        zip_ref.extractall(os.path.join("C:\\Temp\\Test\\dest"))

As you could see, I've unzipped all the files in the source directory and copied them to the dest(ination) directory. The expected final result should be unzipped & copied TXT files in dest directory.(only txt files, NO DIRECTORIES)

My code worked well until I found the structure of the source directory(please refer to the comment below)

+--- [source]
    |
    +--- [subdir1]
    |     |
    |     +--- file1.zip    # this zip file only contains a single txt file!
    |          
    | 
    +--- [subdir2]
    |
    |     +--- file2.zip    # this zip file contains a directory which contains a txt file!
| 

Most of sub-directories in the source directory look like [subdir1] so they work totally fine with my code. But there are a few exceptions like [subdir2] where the zip file contains not only a txt file but a directory containing it... This is what the dest directory looks like with the current code.

+--- [dest]
    |
    +--- [subdir2]
    |     |
    |     +--- file2.txt
    |  
    +--- file1.txt
    | 

Any idea to make there are only unzipped & copied TXT files in the dest directory? I considered copying zip files first and then unzip them in the dest directory but couldn't find a solution yet.... Any help would be appreciated!

1

1 Answers

0
votes

Your problem lies in this line of code:

zip_ref.extractall(os.path.join("C:\\Temp\\Test\\dest"))

You do not need to use os.path.join, as it appends the source subfolder name. You can just use this:

zip_ref.extractall("C:\\Temp\\Test\\dest")

EDIT:

os.path.join is redundant, however excluding it doesn't fix your problem.

The issue is that you cannot call os.walk on zipped files. A solution for this (although I would think there is a more efficient method) is to move all the files into the root directory after they have been extracted.

import os
import shutil
import fnmatch

def gen_find(filepat,top):
    for path, dirlist, filelist in os.walk(top):
        for name in fnmatch.filter(filelist,filepat):
            yield os.path.join(path,name)



if __name__ == '__main__':
    src = 'C:\\Temp\\Test\\dest'

    filesToMove = gen_find("*.txt",src)
    for name in filesToMove:
        shutil.move(name, src)