10
votes

I'm new to Google Cloud Platform.I have trained my model on datalab and saved the model folder on cloud storage in my bucket. I'm able to download the existing files in the bucket to my local machine by doing right-click on the file --> save as link. But when I try to download the folder by the same procedure as above, I'm not getting the folder but its image. Is there anyway I can download the whole folder and its contents as it is? Is there any gsutil command to copy folders from cloud storage to local directory?

6
Not the right place for this quetion.NEKIBUR RAHMAN

6 Answers

21
votes

You can find docs on the gsutil tool here and for your question more specifically here.

The command you want to use is:

gsutil cp -r gs://bucket/folder .
7
votes

Prerequisites: Google Cloud SDK is installed and initialized ($ glcoud init)

Command:

gsutil -m cp -r  gs://bucket-name .

This will copy all of the files using multithread which is faster. I found that the "dir" command instructed for use in the official Gsutil Docs did not work.

5
votes

If you are downloading using data from google cloud storage using python and want to maintain same folder structure , follow this code i wrote in python.

OPTION 1

from google.cloud import storage

def findOccurrences(s, ch): # to find position of '/' in blob path ,used to create folders in local storage
    return [i for i, letter in enumerate(s) if letter == ch]

def download_from_bucket(bucket_name, blob_path, local_path):    
    # Create this folder locally
    if not os.path.exists(local_path):
        os.makedirs(local_path)        

    storage_client = storage.Client()
    bucket = storage_client.get_bucket(bucket_name)
    blobs=list(bucket.list_blobs(prefix=blob_path))

    startloc = 0
    for blob in blobs:
        startloc = 0
        folderloc = findOccurrences(blob.name.replace(blob_path, ''), '/') 
        if(not blob.name.endswith("/")):
            if(blob.name.replace(blob_path, '').find("/") == -1):
                downloadpath=local_path + '/' + blob.name.replace(blob_path, '')
                logging.info(downloadpath)
                blob.download_to_filename(downloadpath)
            else:
                for folder in folderloc:
                    
                    if not os.path.exists(local_path + '/' + blob.name.replace(blob_path, '')[startloc:folder]):
                        create_folder=local_path + '/' +blob.name.replace(blob_path, '')[0:startloc]+ '/' +blob.name.replace(blob_path, '')[startloc:folder]
                        startloc = folder + 1
                        os.makedirs(create_folder)
                    
                downloadpath=local_path + '/' + blob.name.replace(blob_path, '')

                blob.download_to_filename(downloadpath)
                logging.info(blob.name.replace(blob_path, '')[0:blob.name.replace(blob_path, '').find("/")])

    logging.info('Blob {} downloaded to {}.'.format(blob_path, local_path))


bucket_name = 'google-cloud-storage-bucket-name' # do not use gs://
blob_path = 'training/data' # blob path in bucket where data is stored 
local_dir = 'local-folder name' #trainingData folder in local
download_from_bucket(bucket_name, blob_path, local_dir)

OPTION 2: using gsutil sdk One more option of doing it via python program is below.

def download_bucket_objects(bucket_name, blob_path, local_path):
    # blob path is bucket folder name
    command = "gsutil cp -r gs://{bucketname}/{blobpath} {localpath}".format(bucketname = bucket_name, blobpath = blob_path, localpath = local_path)
    os.system(command)
    return command

OPTION 3 - No python ,directly using terminal and google SDK Prerequisites: Google Cloud SDK is installed and initialized ($ glcoud init) Refer to below link for commands:

https://cloud.google.com/storage/docs/gsutil/commands/cp

2
votes

gsutil -m cp -r gs://bucket-name "{path to local existing folder}"

Works for sure.

2
votes

This is how you can download a folder from Google Cloud Storage Bucket

Run the following commands to download it from the bucket storage to your Google Cloud Console local path

gsutil -m cp -r gs://{bucketname}/{folderPath} {localpath}

once you run that command, confirm that your folder is on the localpath by running ls command to list files and directories on the localpath

Now zip your folder by running the command below

zip -r foldername.zp yourfolder/*

Once the zip process is done, click on the more dropdown menu at the right side of the Google Cloud Console,

Google Cloud Console Menu

then select "Download file" Option. You will be prompted to enter the name of the file that you want to download, enter the name of the zip file - "foldername.zp"

0
votes

Here's the code I wrote. This Will download the complete directory structure to your VM/local storage .

from google.cloud import storage
import os
bucket_name = "ar-data"
    
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)

dirName='Data_03_09/' #***folder in bucket whose content you want to download
blobs = bucket.list_blobs(prefix = dirName)#, delimiter = '/')
destpath=r'/home/jupyter/DATA_test/' #***path on your vm/local where you want to download the bucket directory
for blob in blobs:
    #print(blob.name.lstrip(dirName).split('/'))
    currpath=destpath
    if not os.path.exists(os.path.join(destpath,'/'.join(blob.name.lstrip(dirName)).split('/')[:-1])):
        for n in blob.name.lstrip(dirName).split('/')[:-1]:
            currpath=os.path.join(currpath,n)
            if not os.path.exists(currpath):
                print('creating directory- ', n , 'On path-', currpath)
                os.mkdir(currpath)
    print("downloading ... ",blob.name.lstrip(dirName))
    blob.download_to_filename(os.path.join(destpath,blob.name.lstrip(dirName)))

or simply use in terminal :

gsutil -m cp -r gs://{bucketname}/{folderPath} {localpath}