python - Save Pandas data frame to Google Cloud bucket

Question

I want to save pandas data frame directly to Google Cloud Storage. I tried different ways using write-a-pandas-dataframe-to-google-cloud-storage-or-bigquery. But I am not able to save.

Note: I can use google.cloud package only

Below is the code I tried

from google.cloud import storage
import pandas as pd
input_dict = [{'Name': 'A', 'Id': 100}, {'Name': 'B', 'Id': 110}, {'Name': 'C', 'Id': 120}]
df = pd.DataFrame(input_dict)

Try:1

destination = f'gs://bucket_name/test.csv'
df.to_csv(destination)

Try:2

storage_client = storage.Client(project='project')
bucket = storage_client.get_bucket('bucket_name')
gs_file = bucket.blob('test.csv')
df.to_csv(gs_file)

I am getting below errors

for option 1 : No such file or directory: 'gs://bucket_name/test.csv'

option 2: 'Blob' object has no attribute 'close'

Thanks,

Raghunath.

I attempted similar setup and it worked for me. Is your Python code in GCP? And is the Cloud Storage bucket already created? Your try1 Solution should work via the Cloud Shell. — oakinlaja
hi, Raghunath were you able to find the answer to it? I had the exact same issue I'm writing a python script which will be triggered by airflow which writes df to CSV and keeps it in GCS bucket, but I'm getting Missing optional dependency 'gcsfs'. The gcsfs library is required to handle GCS files Use pip or conda to install gcsfs. — Praneeth Kumar
Currently there is no solution to this requirement. I have developed code to create temporary file and then upload to GS — Raghunath

Ali Khosro Ali Khosro · Accepted Answer · 2019-06-27T08:35:33

from google.cloud import storage
import os
from io import StringIO # if going with no saving csv file

# say where your private key to google cloud exists
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'path/to/your-google-cloud-private-key.json'

df = pd.DataFrame([{'Name': 'A', 'Id': 100}, {'Name': 'B', 'Id': 110}])

Write it to a csv file on your machine first and upload it:

df.to_csv('local_file.csv')
gcs.get_bucket('BUCKET_NAME').blob('FILE_NAME.csv').upload_from_filename('local_file.csv', content_type='text/csv')

If you do not want to create a temp csv file, use StringIO:

f = StringIO()
df.to_csv(f)
f.seek(0)
gcs.get_bucket('BUCKET_NAME').blob('FILE_NAME.csv').upload_from_file(f, content_type='text/csv')

python - Save Pandas data frame to Google Cloud bucket

4 Answers