I am using Apache Beam with Cloud Dataflow. Imagine millions of products arriving to the pipeline. After some steps (filtering, mapping and so on) I want to partition the data by some field. I tried to use Partition transform, and I guess it does the partitioning right. However, I have no idea how to proceed further. Here's what confuses me:
My goal is to partition the data by some field and write all that data to different tables. Let's say that the partition transform sliced the data in 18 PCollections, then there should be 18 files. However, the partition transform returns PCollectionList, and I can't apply TextIO transform to it. I tried iterating it and applying TextIO transform to each PCollection, but it didn't work.
How do I write all the parts to file after Partition transform?
Thanks,