1
votes

How can I read nested structures using Apache Beam Python SDK?

lines = p | io.Read(io.BigQuerySource('project:test.beam_in'))

result in

"reason": "invalidQuery",
"message": "Cannot output multiple independently repeated fields at the same time. Found classification_item_distribution and category_cat_name"

Is it possible to read nested structures?

2

2 Answers

1
votes

This is a property of BigQuery. The two ways to execute such a query are to disable result flattening (by BigQuery) or to explicitly flatten fields in your query.

With the current Python SDK only the latter is available - see "Flattening Google Analytics data (with repeated fields) not working anymore" for a guide on where and how to invoke the FLATTEN function.

The feature to disable flattening is filed as BEAM-877 if you care to subscribe to updates or discuss.

1
votes

You can now read nested results directly in Beam Python by adding flatten_results=False when creating your source:

lines = p | io.Read(io.BigQuerySource('project:test.beam_in', flatten_results=False))

See source here.