0
votes

I'm trying to unzip a zipfile (compressed with BZ2) into a directory. The zipfile contains multiple files.

All (and I've seen quite a few already...) of the examples show how to decompress the zipfile into one file.

This is what I have so far:

def unzipBzip2(passed_targetDir, passed_zipfile):
    full_zipfile = pathlib.Path(constants.APP.ROOT, constants.DOWNLOAD_FOLDER, passed_zipfile)
    full_target = pathlib.Path(constants.APP.ROOT, constants.DOWNLOAD_FOLDER, passed_targetDir)
    
    with open(file=full_zipfile, mode="rb") as zipfile, open(full_target, 'wb') as target:
        decompressor = bz2.BZ2Decompressor()

        for data in iter(lambda : zipfile.read(100*1024), b''):
            target.write(decompressor.decompress(data))

    return

Error is:

Traceback (most recent call last):
  ... (stack) ...
  File "/Users/bert/Project/unzipBzip2.py", line 26, in unzipBzip2
    with open(file=fullzipfile, mode="rb") as zipfile, open(full_target, 'wb') as target:
IsADirectoryError: [Errno 21] Is a directory: '/Users/bert/Project/data/51fba56e-c598-491a-a5e4-57373a59367a'

Well, "/Users/bert/Project/data/51fba56e-c598-491a-a5e4-57373a59367a" is indeed a directory. And that's what it should be, since the unzipped files (from the BZ2 zipfile) should be written in that directory.

Why does decompressor complain that this is a directory?

If I change the target to a file

    full_target = pathlib.Path(constants.APP.ROOT, constants.DOWNLOAD_FOLDER, passed_targetDir, 'x.x')

it gives the following error:

  File "/Users/bert/Project/unzipBzip2.py", line 30, in unzipBzip2
    target.write(decompressor.decompress(data))
OSError: Invalid data stream
1
I think you are confusing zip archives which contain one or more member files and [BZ2}(docs.python.org/3/library/bz2.html) which is just a way to compress a single file — it's not a container of other files like the former. - martineau
what's the extension of your zipfile? tar.bz2 ? - emptyhua
@emptyhua, don't mind the extension. It's *.bzip2.zip. That confused me. It now appears to be a 7z zipfile. And that one does have more than one file in it. However the Python package py7zr does not recognise it, while the linux command (7z) does. - BertC
If you could post a (small) sample file somewhere, I may be able help write code to recognize and decompress it. - martineau

1 Answers

0
votes

If your zipfile is a bz2 compressed zip, the code below should work.

def unzipBzip2(passed_targetDir, passed_zipfile):
    full_zipfile = pathlib.Path(constants.APP.ROOT, constants.DOWNLOAD_FOLDER, passed_zipfile)
    full_target = pathlib.Path(constants.APP.ROOT, constants.DOWNLOAD_FOLDER, passed_targetDir)

    with open(file=full_zipfile, mode="rb") as rawf:
        with bz2.BZ2File(rawf) as bz2f:
            with zipfile.ZipFile(bz2f) as zipf:
                zipf.extractall(full_target)

You could try to use file command to identify archive format. for example your file is abc.unkown.bz2

$ file ./abc.unkown.bz2
./abc.unkown.bz2: bzip2 compressed data, block size = 900k

now we can decompress it using bzip2, and got abc.unkown

$ bzip2 -d ./abc.unkown.bz2

then continue with de decompressed abc.unkown

$ file ./abc.unkown
./abc.unkown: Zip archive data, at least v1.0 to extract

the example file is zip format inside bz2