0
votes

I am using PubSub to capture realtime data. Then using GCP Dataflow to stream the data into BigQuery. I am using Java for dataflow.

I want to try out the templates given in DataFlow. The process is: PubSub --> DataFlow --> BigQuery

Currently I am sending message in string format into PubSub (Using Python here). But the template in dataflow is only accepting JSON message. The python library is not allowing me to publish a JSON message. Can anyone suggest me a way publish a JSON message to PubSub so that I can use the dataflow template to do the Job.

1
Can you clarify or provide links for: 1) "But the template in dataflow is only accepting JSON message" 2) "The python library is not allowing me to publish a JSON message"Mar Cial R
The Python Standard Library has a json module which can serialize a Python dictionary to a JSON string: json.dumps(python_dict). Is this what you need?Andrew Nguonly

1 Answers

2
votes

The pipeline pumping data from PubSub to BQ provided by Google now assume JSON format and a matching schema on the other side.

Publishing JSONs to Pubsub is no different from publishing strings. You can try the following code snippets for python dict to JSON conversion:

import json
py_dict = {"name" : "Peter", "locale" : "en-US"}
json_string = json.dumps(py_dict)

If you'd like to do heavy customization to the pipeline, you can also take the source code at the following location and build your own.

https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/master/src/main/java/com/google/cloud/teleport/templates/PubSubToBigQuery.java