2
votes

I'm trying to read a csv_file from Google Storage Cloud to Google Cloud Datalab exactly like suggested in here.

I keep getting the error: Source object gs://analog-arbor-233411/traissn.csv does not exist. (analog-arbor-233411 is my bucket name, traissn.csv is my csv file.

So here I checked that the bucket really exists, and it does.

import google.datalab.storage as storage
mybucket = storage.Bucket('analog-arbor-233411')
mybucket.exists()

Here I even iterate through the mybucket.objects() which gives an iterator for the objects within the bucket to make sure that I get an existing object. So data_csv_meta only takes the last object in the iteration. Then I checked again if it exists, and surely it does!

for i in mybucket.objects():
    data_csv = i
data_csv.exists()

Here is a funny thing. When I run the following, I get the error Source object gs://analog-arbor-233411/traissn.csv does not exist (my object name in data_csv traissn.csv)

uri = data_csv.uri
%gcs read --object $uri --variable data

Tried looking everywhere, but can't get an answer.

1

1 Answers

1
votes

In your current code data_csv.exists() is called outside of the for loop, so it returns the result for only the last data_csv object returned by the bucket iterator, which may or may not be traissn.csv.

So either:

  • inside the for loop add a break statement if data_csv points to traissn.csv, so that data_csv remains unchanged
  • make the gcs call inside the for loop