Libzip - read file contents from zip

5

votes

I using libzip to work with zip files and everything goes fine, until i need to read file from zip I need to read just a whole text files, so it will be great to achieve something like PHP "file_get_contents" function.
To read file from zip there is a function
"int zip_fread(struct zip_file *file, void *buf, zip_uint64_t nbytes)".
Main problem what i don't know what size of buf must be and how many nbytes i must read (well i need to read whole file, but files have different size). I can just do a big buffer to fit them all and read all it's size, or do a while loop until fread return -1 but i don't think it's rational option.

c++c zipzipfile

6

votes

You can try using zip_stat to get file size. http://linux.die.net/man/3/zip_stat

3

votes

I haven't used the libzip interface but from what you write it seems to look very similar to a file interface: once you got a handle to the stream you keep calling zip_fread() until this function return an error (ir, possibly, less than requested bytes). The buffer you pass in us just a reasonably size temporary buffer where the data is communicated.

Personally I would probably create a stream buffer for this so once the file in the zip archive is set up it can be read using the conventional I/O stream methods. This would look something like this:

struct zipbuf: std::streambuf {
    zipbuf(???): file_(???) {}
private:
    zip_file* file_;
    enum { s_size = 8196 };
    char buffer_[s_size];
    int underflow() {
        int rc(zip_fread(this->file_, this->buffer_, s_size));
        this->setg(this->buffer_, this->buffer_,
                        this->buffer_ + std::max(0, rc));
        return this->gptr() == this->egptr()
            ? traits_type::eof()
            : traits_type::to_int_type(*this->gptr());
    }
};

With this stream buffer you should be able to create an std::istream and read the file into whatever structure you need:

zipbuf buf(???);
std::istream in(&buf);
...

Obviously, this code isn't tested or compiled. However, when you replace the ??? with whatever is needed to open the zip file, I'd think this should pretty much work.

2

votes

Here is a routine I wrote that extracts data from a zip-stream and prints out a line at a time. This uses zlib, not libzip, but if this code is useful to you, feel free to use it:

#
# compile with -lz option in order to link in the zlib library
#

#include <zlib.h>

#define Z_CHUNK 2097152

int unzipFile(const char *fName) 
{
    z_stream zStream;
    char *zRemainderBuf = malloc(1);
    unsigned char zInBuf[Z_CHUNK];
    unsigned char zOutBuf[Z_CHUNK];
    char zLineBuf[Z_CHUNK];
    unsigned int zHave, zBufIdx, zBufOffset, zOutBufIdx;
    int zError;
    FILE *inFp = fopen(fName, "rbR");

    if (!inFp) { fprintf(stderr, "could not open file: %s\n", fName); return EXIT_FAILURE; }

    zStream.zalloc = Z_NULL;
    zStream.zfree = Z_NULL;
    zStream.opaque = Z_NULL;
    zStream.avail_in = 0;
    zStream.next_in = Z_NULL;  

    zError = inflateInit2(&zStream, (15+32)); /* cf. http://www.zlib.net/manual.html */
    if (zError != Z_OK) { fprintf(stderr, "could not initialize z-stream\n"); return EXIT_FAILURE; }

    *zRemainderBuf = '\0';
    do {
        zStream.avail_in = fread(zInBuf, 1, Z_CHUNK, inFp);
        if (zStream.avail_in == 0)
            break;
        zStream.next_in = zInBuf;
        do {
            zStream.avail_out = Z_CHUNK;
            zStream.next_out = zOutBuf;
            zError = inflate(&zStream, Z_NO_FLUSH);
            switch (zError) {
                case Z_NEED_DICT:  { fprintf(stderr, "Z-stream needs dictionary!\n"); return EXIT_FAILURE; }
                case Z_DATA_ERROR: { fprintf(stderr, "Z-stream suffered data error!\n"); return EXIT_FAILURE; }
                case Z_MEM_ERROR:  { fprintf(stderr, "Z-stream suffered memory error!\n"); return EXIT_FAILURE; }
            }
            zHave = Z_CHUNK - zStream.avail_out;
            zOutBuf[zHave] = '\0';

            /* copy remainder buffer onto line buffer, if not NULL */
            if (zRemainderBuf) {
                strncpy(zLineBuf, zRemainderBuf, strlen(zRemainderBuf));
                zBufOffset = strlen(zRemainderBuf);
            }
            else
                zBufOffset = 0;

            /* read through zOutBuf for newlines */
            for (zBufIdx = zBufOffset, zOutBufIdx = 0; zOutBufIdx < zHave; zBufIdx++, zOutBufIdx++) {
                zLineBuf[zBufIdx] = zOutBuf[zOutBufIdx];
                if (zLineBuf[zBufIdx] == '\n') {
                    zLineBuf[zBufIdx] = '\0'; 
                    zBufIdx = -1;
                    fprintf(stdout, "%s\n", zLineBuf);
                }
            }

            /* copy some of line buffer onto the remainder buffer, if there are remnants from the z-stream */
            if (strlen(zLineBuf) > 0) {
                if (strlen(zLineBuf) > strlen(zRemainderBuf)) {
                    /* to minimize the chance of doing another (expensive) malloc, we double the length of zRemainderBuf */
                    free(zRemainderBuf);
                    zRemainderBuf = malloc(strlen(zLineBuf) * 2);
                }
                strncpy(zRemainderBuf, zLineBuf, zBufIdx);
                zRemainderBuf[zBufIdx] = '\0';
            }
        } while (zStream.avail_out == 0);
    } while (zError != Z_STREAM_END);

    /* close gzip stream */
    zError = inflateEnd(&zStream);
    if (zError != Z_OK) { 
        fprintf(stderr, "could not close z-stream!\n");
        return EXIT_FAILURE;
    }
    if (zRemainderBuf)
        free(zRemainderBuf);

    fclose(inFp);

    return EXIT_SUCCESS;
}

1

votes

With any streaming you should consider the memory requirements of your app. A good buffer size is large, but you do not want to have too much memory in use depending on your RAM usage requirements. A small buffer size will require you call your read and write operations more times which are expensive in terms of time performance. So, you need to find a buffer in the middle of those two extremes.

Typically I use a size of 4096 (4KB) which is sufficiently large for many purposes. If you want, you can go larger. But at the worst case size of 1 byte, you will be waiting a long time for you read to complete.

So to answer your question, there is no "right" size to pick. It is a choice you should make so that the speed of your app and the memory it requires are what you need.

Libzip - read file contents from zip

4 Answers