0
votes

I would like to use the latest google-cloud-bigquery and dataflow sdk that is available for python 2.7

The client bigquery code for old and new versions has changed dramatically and the older versions are planned to be deprecated. based on the following publication: https://cloud.google.com/bigquery/docs/python-client-migration

My pipeline setup is the following:

*from setuptools import setup, find_packages*
*setup(*
*  name='big-query',*
*  version='1.0.0',*
*  packages=find_packages(),*
*  keywords=[*
*  ],*
*  license="Apache Software License",*
*  install_requires=[*
*    'google-cloud-bigquery==0.28.0',*
*  ],*
*  package_data={*
*  },*
*  data_files=[],*
*)*

I call it from the pipeline code:

options.view_as(SetupOptions).setup_file = "./setup.py"

Environment: The SDK version on the dataflow view is showing 2.0.0 and a deprecation message The pipeline is written in Python 2.7.0 on Google Cloud datalab environment The installation of the update google-cloud-bigquery is failing

My questions are: 1. How do i update the dataflow SDK? setup.py file? update datalab? 2. What is the latest version of google-cloud-bigquery that I can use and its matching dataflow.

Thanks, eilalan

1
Clarifying question: the version of the Beam Python SDK is 2.0.0?Kenn Knowles

1 Answers

0
votes
  1. How do i update the dataflow SDK? setup.py file? update datalab?

Dataflow SDK is now being deprecated but you can install the Apache Beam SDK since it is fully supported by Dataflow and previous Apache Beam SDK since 2.0.0. Here's the official Google announcement in that regard:

Cloud Dataflow SDK Deprecation Notice: The Cloud Dataflow SDK 2.5.0 is the last Cloud Dataflow SDK release that is separate from the Apache Beam SDK releases.
The Cloud Dataflow service fully supports official Apache Beam SDK releases. The Cloud Dataflow service also supports previously released Apache Beam SDKs starting with version 2.0.0 and above.

Dataflow SDK can be upgraded via pip:

pip install --upgrade apache-beam[gcp]

You can check the setup.py syntax whenever you need to specify the version of your dependencies in your environment.

  1. What is the latest version of google-cloud-bigquery that I can use and its matching dataflow.

Some libraries are not forward compatible, you can use these SDK vs worker dependencies compatibility list for reference. As you can see in the list the last google-cloud-bigquery version already installed on the workers & fully supported with your configuration is 1.17.0, but bear in mind that Python 2.x, any related SDK, and library version will be no longer supported by January 1, 2020.