0
votes

Our process is currently a little clunky we're getting batched CSV outputs from the database, which are turned into json and streamed to pub/sub.

This is troublesome because every element in the json will be STRING format and when we try to write to bigquery it fails unless there's a type cast from within Java.

Is there any preferred typed flat-file format we could use for small batches, so that when we transfer using pub/sub, we would retain type information at a record level?

1
You can use readMessages method from the PubsubIO class in order to avoid string conversion. But you'll still need to convert Pub/Sub message to TableRow.Yurci

1 Answers

3
votes

Depends on how exactly your pipeline is setup.

In general, PubsubIO has a few ways to read/write messages:

Avros and Protos can help you simplify the serialization/deserialization step for Pubsub to avoid putting everything into a string.

But, as Yurci mentioned, you will still need to convert the payload you got from Pubsub messages to TableRows to write them to BigQuery.