4
votes

I just want to grab some output data from a Google Cloud Datalab notebook quickly, preferably as a throwaway CSV file.

I've done this:

writer = csv.writer(open('output.csv', 'wb'))
for row in rows:
    writer.writerow(row)

This writes a local file, but then I can't either open it in the browser, or (see how to) download it from Cloud Datalab.

How can I quickly grab my data as a CSV file? I guess maybe I have to use the Storage APIs and write it ? I'm finding the docs a bit hard to follow, I've got something like this:

import gcp
import gcp.storage as storage

// create CSV file? construct filepath? how?

mybucket = storage.Bucket(myfile)
mybucket.create()
5

5 Answers

10
votes

There are at least 2 options:

Download files locally from Datalab

This option does not appear to be available in the current Datalab code. I have submitted a pull request for Datalab which may resolve your issue. The fix allows users to edit/download files which are not notebooks (*.ipynb) using the Datalab interface. I was able to download/edit a text file from Datalab using the modification in the pull request.

Send files to a Storage Bucket in Google Cloud

The following link may be helpful in writing code to transfer files to a storage bucket in Google Cloud using the Storage API.

Here is a working example:

from datalab.context import Context
import datalab.storage as storage

sample_bucket_name = Context.default().project_id + '-datalab-example'
sample_bucket_path = 'gs://' + sample_bucket_name

sample_bucket = storage.Bucket(sample_bucket_name)

# Create storage bucket if it does not exist
if not sample_bucket.exists():
    sample_bucket.create()

# Write an item to the storage bucket
sample_item = sample_bucket.item('stringtofile.txt')
sample_item.write_to('This is a string', 'text/plain')

# Another way to copy an item from Datalab to Storage Bucket
!gsutil cp 'someotherfile.txt' sample_bucket_path

Once you've copied an item, click here to view the item in a Storage Bucket in Google Cloud

0
votes

How much data are you talking about? I'm assuming this is not a BigQuery Table, as we have APIs for that.

For the storage APIs, think of a bucket as being like a folder. You need to create an Item in the Bucket. If you assign the data to a Python variable as a string, there is an API on Item (write_to) that you can use.

If you write to a file like you did with output.csv, that file lives in the Docker container that Datalab is running in. That means it is transient and will disappear when the container is shut down. However, it is accessible in the meantime and you can use a %%bash cell magic to send it to some other destination using, for example, curl.

0
votes

I found an easier way to write csv files from datalab notebook to bucket.

    %storage write --object "gs://pathtodata/data.csv" --variable data

Here 'data' is a dataframe in your notebook !

0
votes

Use the ungit tool available in datalab to commit your files to your Google source repository and then clone that repository onto your local machine using the gcloud command:

C:\\gcloud source repos clone datalab-notebooks --project=your-vm-instance-name
0
votes

As someone posted above:

!gsutil cp 'someotherfile.txt' sample_bucket_path

did the job for me. Got the file from Datalab into Google cloud storage.