2
votes

So I am trying to port a Python webapp written with Flask to Google App Engine. The app hosts user uploaded files up to 200mb in size, and for non-image files the original name of the file needs to be retained. To prevent filename conflicts, e.g. two people uploading stuff.zip, each containing completely different and unrelated contents, the app creates a UUID folder on the filesystem and stores the file within that, and serves them to users. Google App Engine's Cloud Storage, which I was planning on using to store the user files, by making a bucket - according to their documentation has "no notion of folders". What is the best way to go about getting this same functionality with their system?

The current method, just for demonstration:

 # generates a new folder with a shortened UUID name to save files
        # other than images to avoid filename conflicts
        else:
            # if there is a better way of doing this i'm not clever enough
            # to figure it out
            new_folder_name = shortuuid.uuid()[:9]
            os.mkdir(
                os.path.join(app.config['FILE_FOLDER'], new_folder_name))
            file.save(
                os.path.join(os.path.join(app.config['FILE_FOLDER'], new_folder_name), filename))
            new_folder_path = os.path.join(
                app.config['FILE_FOLDER'], new_folder_name)
            return url_for('uploaded_file', new_folder_name=new_folder_name)
1

1 Answers

4
votes

From the Google Cloud Storage Client Library Overview documentation:

GCS and "subdirectories"

Google Cloud Storage documentation refers to "subdirectories" and the GCS client library allows you to supply subdirectory delimiters when you create an object. However, GCS does not actually store the objects into any real subdirectory. Instead, the subdirectories are simply part of the object filename. For example, if I have a bucket my_bucket and store the file somewhere/over/the/rainbow.mp3, the file rainbow.mp3 is not really stored in the subdirectory somewhere/over/the/. It is actually a file named somewhere/over/the/rainbow.mp3. Understanding this is important for using listbucket filtering.

While Cloud Storage does not support subdirectories per se, it allows you to use subdirectory delimiters inside filenames. This basically means that the path to your file will still look exactly as if it was inside a subdirectory, even though it is not. This apparently should concern you only when you're iterating over the entire contents of the bucket.

From the Request URIs documentation:

URIs for Standard Requests

For most operations you can use either of the following URLs to access objects:

storage.googleapis.com/<bucket>/<object>

<bucket>.storage.googleapis.com/<object>

This means that the public URL for their example would be http://storage.googleapis.com/my_bucket/somewhere/over/the/rainbow.mp3. Their service would interpret this as bucket=my_bucket and object=somewhere/over/the/rainbow.mp3 (i.e. no notion of subdirectories, just an object name with embedded slashes in it); the browser however will just see the path /my_bucket/somewhere/over/the/rainbow.mp3 and will interpret it as if the filename is rainbow.mp3.