I have a dataflow job where I will read from bigquery query first (in standard sql). It works perfectly in direct runner mode. However I tried to run this dataflow in dataflow runner mode and encountered this error :
response: <{'vary': 'Origin, X-Origin, Referer', 'content-type': 'application/json; charset=UTF-8', 'date': 'Thu, 24 Dec 2020 09:28:21 GMT', 'server': 'ESF', 'cache-control': 'private', 'x-xss-protection': '0', 'x-frame-options': 'SAMEORIGIN', 'x-content-type-options': 'nosniff', 'transfer-encoding': 'chunked', 'status': '400', 'content-length': '470', '-content-encoding': 'gzip'}>, content <{ "error": { "code": 400, "message": "Querying tables partitioned on a field is not supported in Legacy SQL: 551608533331:GoogleSearchConsole.search_query_analytics_log.", "errors": [ { "message": "Querying tables partitioned on a field is not supported in Legacy SQL: 551608533331:GoogleSearchConsole.search_query_analytics_log.", "domain": "global", "reason": "invalid" } ], "status": "INVALID_ARGUMENT" } } >
Apparently the use_standard_sql parameter doesn't work in dataflow runner mode. Version: apache-beam: 2.24.0 python: 3.8
last_update_date = pipeline | 'Read last update date' >> beam.io.Read(beam.io.BigQuerySource(
query='''
SELECT
MAX(date) AS date
FROM
GoogleSearchConsole.search_query_analytics_log
''',
use_standard_sql=True
))