0
votes

We have a problem about data transfer from Google Cloud Datastore into Bigquery. We need to create dataflow script in python for this job. This job should transfer data from datastore to bigquery by using pipeline in python. For this job in python, it requires "Apache Beam" library.But Apache Beam library is not working. Could anyone help us ?

1

1 Answers

1
votes

Google Cloud Dataflow SDK for Python is ready for use, with Beta level of support in Google Cloud Platform at this time. It is based on the Apache Beam codebase. Please follow the Quickstart to get started with this SDK. If you see a specific error, please ask a separate question and quote the specific problem.

That said, the SDK for Python doesn't provide an API to access Google Cloud Datastore directly yet. You could write one using the generic Source and Sink APIs. This is not hard, but not trivial either. This is something we are actively working on, and the Python SDK will include this API in the near future.

In the meanwhile, I'd suggest perhaps trying the SDK for Java for this task, which includes DatastoreIO and BigqueryIO APIs.