1
votes

I tried using gsutil to download files in a bucket, but now would like to incorporate the download in a python script to automate the download process (for downloading specific days data). The following gsutil code worked fine.

gsutil -m cp -r gs://gcp-public-data-goes-16/GLM-L2-LCFA/2019/001 C:\dloadFiles

Using the storage client I have tried:

from google.cloud import storage
client = storage.Client()
with open('C:\dloadFiles') as file_obj:
     client.download_blob_to_file(
         'gs://gcp-public-data-goes-16/GLM-L2-LCFA/2019/001', file_obj)`

I get error "DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started"

This is a publicly available bucket.

3
Have you read the docs of this library?ForceBru
Try storage library and use Google cloud doc. You have examples to simply copy and paste! Welcome on GCP!guillaume blaquiere
@ForceBru I have read the download_to_file(file_obj, client=None, start=None, end=None)[source]# syntax but am not sure if that is applicable for an entire bucket folder or just a file?Andrew
@guillaumeblaquiere I have tried "Retrieve a bucket using a string" (googleapis.github.io/google-cloud-python/latest/storage/…) but get an error. Is retrieve the same as download? If I use "Download a blob using a URI", is a blob considered a bucket folder?Andrew
1) Stackoverlow is for programming questions. Please post your code and we will help you solve problems. 2) Your question shows little effort in the question to try and solve your problem. 3) Edit your question with code that you have written, what you expect and any error messages. Then we will try to help you. Otherwise your question will be downvoted and closed. This link will help you understand how to use StackOverflow correctly: stackoverflow.com/help/how-to-askJohn Hanley

3 Answers

2
votes

You did not setup GOOGLE_APPLICATION_CREDENTIALS Follow below link and setup credentials https://stackguides.com/questions/45501082/set-google-application-credentials-in-python-project-to-use-google-api

After setting up credentials your code will work

0
votes

After authenticating with your GCP credentials, you will also need to run:

gcloud auth application-default

To authenticate your application SDKs, such as your Python client libraries. Then you will be able to interact with GCP services via Python. Also, you are copying a whole load of files with your gsutil command and not just one as you're doing with python. So you probably want to list_blobs first and then iteratively download them to files. Also check out blob.download_to_file save you some coding (docs here). With that you can send a blob to a filename directly, without opening the file first.

0
votes

First thing, turn off public on this bucket unless you really need the bucket to be public. For private access, you should use a service account (your code) or OAuth credentials.

If you are running this code in a Google Compute Service, credentials will be automatically discovered (ADC).

If you are running outside of Google Cloud, change this line:

client = storage.Client()

To this:

client = storage.Client().from_service_account_json('/full/path/to/service-account.json')

This line in your code is trying to open a directory. This is not correct. You need to specify a file name and not a directory name. You also need to specify write permission:

with open('C:\dloadFiles') as file_obj:

Change to

with open('c:/directory/myfilename', 'w')

Or for binary (data) files:

with open('c:/directory/myfilename', 'wb')

I am assuming that this path is a file blob and not a "simulated" folder on GCS. If this is a folder, you will need to change it to a file (storage object blob).

gs://gcp-public-data-goes-16/GLM-L2-LCFA/2019/001