19
votes

I have a few zip and rar files that I'm working with, and I'm trying to analyze the properties of how each file was compressed (compression level, compression algorithm (e.g. deflate, LZMA, BZip2), dictionary size, word size, etc.), and I haven't figured out a way to do this yet.

Is there any way to analyze the files to determine these properties, with software or otherwise?

Cheers and thanks!

7

7 Answers

4
votes

I suggest hachoir-wx to have a look at these files. How to install a Python package or you can try ActivePython with PyPM when using Windows. When you have the necessary hachoir packages installed, you can do something like this to run the GUI:

python C:\Python27\Scripts\hachoir-wx

It enables you to browse through the data fields of RAR and ZIP files. See this screenshot for an example.

For RAR files, have a look at the technote.txt file that is in the WinRAR installation directory. This gives detailed information of the RAR specification. You will probably be interested in these:

 HEAD_FLAGS      Bit flags: 2 bytes
                 0x10 - information from previous files is used (solid flag)
                 bits 7 6 5 (for RAR 2.0 and later)
                      0 0 0    - dictionary size   64 KB
                      0 0 1    - dictionary size  128 KB
                      0 1 0    - dictionary size  256 KB
                      0 1 1    - dictionary size  512 KB
                      1 0 0    - dictionary size 1024 KB
                      1 0 1    - dictionary size 2048 KB
                      1 1 0    - dictionary size 4096 KB
                      1 1 1    - file is directory

Dictionary size can be found in the WinRAR GUI too.

 METHOD          Packing method 1 byte
                 0x30 - storing
                 0x31 - fastest compression
                 0x32 - fast compression
                 0x33 - normal compression
                 0x34 - good compression
                 0x35 - best compression

And Wikipedia also knows this:

The RAR compression utility is proprietary, with a closed algorithm. RAR is owned by Alexander L. Roshal, the elder brother of Eugene Roshal. Version 3 of RAR is based on Lempel-Ziv (LZSS) and prediction by partial matching (PPM) compression, specifically the PPMd implementation of PPMII by Dmitry Shkarin.

For ZIP files I would start by having a look at the specifications and the ZIP Wikipedia page. These are probably interesting:

  general purpose bit flag: (2 bytes)
  compression method: (2 bytes)
12
votes

This is a fairly old question, but I wanted to throw in my two cents anyway since some of the methods above weren't as easy for me to use.

You can also determine this with 7-Zip. After opening the archive there is a column for method of compression:

7zip properties

8
votes

For ZIP - yes, zipinfo

For RAR, the headers are easily found with either 7Zip or WinRAR, read the attached documentation

3
votes

Via 7-Zip (or p7zip) command line:

7z l -slt archive.file

If looking specifically for the compression method:

7z l -slt archive.file | grep -e '^---' -e '^Path =' -e '^Method ='
1
votes

For the ZIP files, there is a command zipinfo.

0
votes

The type is easy, just look at the file headers (PK and Rar).

As for the rest, I doubt that information is available in the compressed content.

0
votes

The zipfile python module can be used to get info about the zipfile. The ZipInfo class provides information like filename, compress_type, compress_size, file_size etc...

Python snippet to get filename and the compress type of files in a zip archive

import zipfile

with zipfile.ZipFile(path_to_zipfile, 'r') as zip:
    for info in zip.infolist():
        print(f'filename: {info.filename}')
        print(f'compress type: {info.compress_type}')

This would list all the filenames and their corresponding compression type(integer), which can be used to look up the compression method.
You can get a lot more info about the files using infolist().

The python module linked in the accepted answer is not available, zipfile module might help