5
votes

I have a python web application deployed on Google App Engine.

I need to grab a log file stored on Amazon S3 and load it into Google Cloud Storage. Once it is in Google Cloud Storage I may need to perform some transformations and eventually import the data into BigQuery for analysis.

I tried using gsutil as a some sort of proof of concept, since boto is under the hood of gsutil and I'd like to use boto in my project. This did not work.

I'd like to know if anyone has managed to transfer file directly between the 2 clouds. If possible I'd like to see a simple example. In the end this task has to be accomplished through code executing on GAE.

3

3 Answers

9
votes

Per this thread, you can stream data from S3 to Google Cloud Storage using gsutil but every byte still has to take two hops: S3 to your local computer and then your computer to GCS. Since you're using App Engine, however, you should be able to pull from S3 and deposit into GCS. It's the same progression as above except App Engine is the intermediary, i.e. every byte travels from S3 to your app and then to GCS. You could use boto for the pull side and the Google Cloud Storage API for the push side.

3
votes

Google allows you to import entire buckets from S3 to the storage service:

https://cloud.google.com/storage/transfer/getting-started

You can set file filters on the source bucket to only import the file you want, or a "directory" (i.e. anything with a certain prefix).

1
votes

I'm not aware of any cloud provider that provides an API for transferring data to a competing cloud provider. Cloud providers have no incentive to help you move your data to the competition. You will almost certainly have to read the data to an intermediate machine then write it to Google.