I'm new to Google Cloud Platform.I have trained my model on datalab and saved the model folder on cloud storage in my bucket. I'm able to download the existing files in the bucket to my local machine by doing right-click on the file --> save as link. But when I try to download the folder by the same procedure as above, I'm not getting the folder but its image. Is there anyway I can download the whole folder and its contents as it is? Is there any gsutil command to copy folders from cloud storage to local directory?
6 Answers
If you are downloading using data from google cloud storage using python and want to maintain same folder structure , follow this code i wrote in python.
OPTION 1
from google.cloud import storage
def findOccurrences(s, ch): # to find position of '/' in blob path ,used to create folders in local storage
return [i for i, letter in enumerate(s) if letter == ch]
def download_from_bucket(bucket_name, blob_path, local_path):
# Create this folder locally
if not os.path.exists(local_path):
os.makedirs(local_path)
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blobs=list(bucket.list_blobs(prefix=blob_path))
startloc = 0
for blob in blobs:
startloc = 0
folderloc = findOccurrences(blob.name.replace(blob_path, ''), '/')
if(not blob.name.endswith("/")):
if(blob.name.replace(blob_path, '').find("/") == -1):
downloadpath=local_path + '/' + blob.name.replace(blob_path, '')
logging.info(downloadpath)
blob.download_to_filename(downloadpath)
else:
for folder in folderloc:
if not os.path.exists(local_path + '/' + blob.name.replace(blob_path, '')[startloc:folder]):
create_folder=local_path + '/' +blob.name.replace(blob_path, '')[0:startloc]+ '/' +blob.name.replace(blob_path, '')[startloc:folder]
startloc = folder + 1
os.makedirs(create_folder)
downloadpath=local_path + '/' + blob.name.replace(blob_path, '')
blob.download_to_filename(downloadpath)
logging.info(blob.name.replace(blob_path, '')[0:blob.name.replace(blob_path, '').find("/")])
logging.info('Blob {} downloaded to {}.'.format(blob_path, local_path))
bucket_name = 'google-cloud-storage-bucket-name' # do not use gs://
blob_path = 'training/data' # blob path in bucket where data is stored
local_dir = 'local-folder name' #trainingData folder in local
download_from_bucket(bucket_name, blob_path, local_dir)
OPTION 2: using gsutil sdk One more option of doing it via python program is below.
def download_bucket_objects(bucket_name, blob_path, local_path):
# blob path is bucket folder name
command = "gsutil cp -r gs://{bucketname}/{blobpath} {localpath}".format(bucketname = bucket_name, blobpath = blob_path, localpath = local_path)
os.system(command)
return command
OPTION 3 - No python ,directly using terminal and google SDK Prerequisites: Google Cloud SDK is installed and initialized ($ glcoud init) Refer to below link for commands:
This is how you can download a folder from Google Cloud Storage Bucket
Run the following commands to download it from the bucket storage to your Google Cloud Console local path
gsutil -m cp -r gs://{bucketname}/{folderPath} {localpath}
once you run that command, confirm that your folder is on the localpath by running ls
command to list files and directories on the localpath
Now zip your folder by running the command below
zip -r foldername.zp yourfolder/*
Once the zip process is done, click on the more dropdown menu at the right side of the Google Cloud Console,
then select "Download file" Option. You will be prompted to enter the name of the file that you want to download, enter the name of the zip file - "foldername.zp"
Here's the code I wrote. This Will download the complete directory structure to your VM/local storage .
from google.cloud import storage
import os
bucket_name = "ar-data"
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
dirName='Data_03_09/' #***folder in bucket whose content you want to download
blobs = bucket.list_blobs(prefix = dirName)#, delimiter = '/')
destpath=r'/home/jupyter/DATA_test/' #***path on your vm/local where you want to download the bucket directory
for blob in blobs:
#print(blob.name.lstrip(dirName).split('/'))
currpath=destpath
if not os.path.exists(os.path.join(destpath,'/'.join(blob.name.lstrip(dirName)).split('/')[:-1])):
for n in blob.name.lstrip(dirName).split('/')[:-1]:
currpath=os.path.join(currpath,n)
if not os.path.exists(currpath):
print('creating directory- ', n , 'On path-', currpath)
os.mkdir(currpath)
print("downloading ... ",blob.name.lstrip(dirName))
blob.download_to_filename(os.path.join(destpath,blob.name.lstrip(dirName)))
or simply use in terminal :
gsutil -m cp -r gs://{bucketname}/{folderPath} {localpath}