0
votes

I am working on Databricks and I would like to know if I can read csv file from google cloud.

I was trying to read this guideline: https://docs.databricks.com/data/data.html

I can read the data locally in Python in this way

path = 'myJson.json'
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = path
client = storage.Client()
name = 'https://console.cloud.google.com/storage/browser/myBucket/'
bucket_id = 'myBucket'
bucket = client.get_bucket(bucket_id)

df = pd.read_csv('gs://myBucket/feed/us/2020/03/19/18/data0000000001.csv.gz, compression='gzip')
1
Please have a look on the below link that has suggested to use IR to import the data from GCP to Azure. docs.microsoft.com/en-us/azure/data-factory/… - venus

1 Answers

0
votes

Unfortunately, connectivity to Google Cloud is not supported as a source in Azure Databricks.

Supported Data Sources in Azure Databricks: https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/

As per my research, I have found a third-party tool named "Panoply" which you can Start analyzing your Google Cloud Storage data with Databricks.