1
votes

I'm implementing a serverless project on google cloud. Users will upload 4GB sized zip files on a cloud storage bucket. (Users compress files on their own before uploading) They need to be uncompressed before the contents can be processed.

I find some solutions for small files:

  1. download the zip file from the storage bucket to a cloud function
  2. unzip in the function
  3. upload the unzipped files to the storage bucket

Here, the file downloaded by function is stored in a memory space allocated to the function. However, the maximum memory for cloud functions is 2GB which is too small for me.

In the worst case, I would need to use VMs but that would be expensive.

Are there any other ways around? Preferred language is python.

1

1 Answers

6
votes

A solution for node would look something like this:

  1. Use the @google-cloud/storage library to create a read stream from the zip file in storage
  2. Pipe that stream to a module like unzip-stream, which says it lets you handle zipped files as stream.
  3. For each entry in the zip, use the Cloud Storage library to create a write stream to a new file in storage, and pipe the input stream from unzip-stream to the new output stream.

You will likely need to understand node streams well in order to make this happen.

Since this is all happening by piping streams (and not reading everything into memory at once), it should work with using minimal memory.