I'm trying to write Google PubSub messages to Google Cloud Storage using Google Cloud Dataflow. The PubSub messages come into json format and the only operation that I want to perform is a transformation from json to parquet file.
In the official documentation I found a template provided by google that reads data from a Pub/Sub topic and writes Avro files into the specified Cloud Storage bucket (https://cloud.google.com/dataflow/docs/guides/templates/provided-streaming#pubsub-to-cloud-storage-avro). The problem is that the template source code is written in Java, while I prefer to use the Python SDK.
These are the first tests I'm doing with Dataflow and Beam in general, and there's not a lot of material online to take a hint from. Any suggestions, links, guidance, piece of code would be greatly appreciated.