Is it possible to use the equivalent of --autodetect
in DataFlow?
i.e. can we load data into a BQ table without specifying a schema, equivalent to how we can load data from a CSV with --autodetect
?
Is it possible to use the equivalent of --autodetect
in DataFlow?
i.e. can we load data into a BQ table without specifying a schema, equivalent to how we can load data from a CSV with --autodetect
?
If you are using protocol buffers as objects in your PCollections (which should be performing very well on the Dataflow back-end) you might be able to use a util I wrote in the past. It will parse the schema of the protobuffer into a BigQuery schema at runtime, based on inspection of the protobuffer descriptor.
I quickly uploaded it to GitHub, it's WIP, but you might be able to use it or be inspired to write something similar using Java Reflection (I might do it myself at some point).
You can use the util as follows:
TableSchema schema = ProtobufUtils.makeTableSchema(ProtobufClass.getDescriptor());
enhanced_events.apply(BigQueryIO.Write.to(tableToWrite).withSchema(schema)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE));
where the create disposition will create the table with the schema specified and the ProtobufClass is the class generated using your Protobuf schema and the proto compiler.
I'm not sure about reading from BQ, but for writes I think that something like this will work on the latest java SDK.
.apply("WriteBigQuery", BigQueryIO.Write
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER)
.to(outputTableName));
Note: BigQuery Table must be of the form: <project_name>:<dataset_name>.<table_name>.