2
votes

I've been new working Google Cloud Platform and am looking to read files from a bucket, do some transformation, then eventually save transformed data back into Cloud Storage. As of now, I'm having trouble reading that data and seeing even the file names. Could I get some insight on how to best read and transform data from a bucket?

Here is my code:

from google.cloud import storage
from google.cloud.storage import Blob

bucket_name = 'test_bucket'
file_name = '*.txt'
client = storage.Client()

bucket = client.get_bucket(bucket_name)

for f in bucket:
    print(f)

TypeError: 'Bucket' object is not iterable

blob = bucket.get_blob(file_name)
print(blob)

None

print(blob.download_as_string())

AttributeError: 'NoneType' object has no attribute 'download_as_string'

2

2 Answers

2
votes

You want:

bucket = client.get_bucket(bucket)
for f in bucket.list_blobs():
  print(f)

For referenec, the docs are at: https://googlecloudplatform.github.io/google-cloud-python/latest/storage/buckets.html

You can't use wildcards like "*.txt". You can set prefixes, but if you want to find things ending in ".txt", you'll have to iterate through everything and filter yourself.

0
votes

It would be better if you check for the type of object returned after calling bucket.get_blob(filename) to ensure the NoneType error does not occur:

blob = bucket.get_blob(file_name)
if blob is not None and blob.exists(client):
   blob.download_as_string()